Linux RDMA and InfiniBand development

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

* Re: nvmet_rdma crash - DISCONNECT event with NULL queue
From: Sagi Grimberg @ 2016-11-01 16:34 UTC (permalink / raw)
  To: Steve Wise, 'Christoph Hellwig'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <01cc01d2345b$d445acd0$7cd10670$@opengridcomputing.com>


>>> I just hit an nvmf target NULL pointer deref BUG after a few hours of
> keep-alive
>>> timeout testing.  It appears that nvmet_rdma_cm_handler() was called with
>>> cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as
>>> NULL.  But then nvmet_rdma_queue_disconnect() is called with queue == NULL
>> which
>>> causes the crash.
>>
>> AFAICT, the only way cm_id->qp is NULL is for a scenario we didn't even
>> get to allocate a queue-pair (e.g. calling rdma_create_qp). The teardown
>> paths does not nullify cm_id->qp...
>
> rdma_destroy_qp() nulls out cm_id->qp.

pphh, somehow managed to miss it...

So we have a case where we can call rdma_destroy_qp and
then rdma_destroy_id but still get events on the cm_id...
Not very nice...

So I think that the patch from Bart a few weeks ago was correct:

---
  drivers/nvme/target/rdma.c | 5 ++++-
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index d1aea17..a61e47f 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1354,9 +1354,12 @@ static int nvmet_rdma_cm_handler(struct 
rdma_cm_id *cm_id,
  		break;
  	case RDMA_CM_EVENT_ADDR_CHANGE:
  	case RDMA_CM_EVENT_DISCONNECTED:
-	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
  		nvmet_rdma_queue_disconnect(queue);
  		break;
+	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
+		if (queue)
+			nvmet_rdma_queue_disconnect(queue);
+		break;
  	case RDMA_CM_EVENT_DEVICE_REMOVAL:
  		ret = nvmet_rdma_device_removal(cm_id, queue);
  		break;
---

In case this fixes the issue (as expected) I'll queue it up
with a change log and a code comment on why we need to do
this (and include all the relevant cases around it)...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* RE: nvmet_rdma crash - DISCONNECT event with NULL queue
From: Steve Wise @ 2016-11-01 16:20 UTC (permalink / raw)
  To: 'Sagi Grimberg', 'Christoph Hellwig'
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <6f42d056-284d-00fc-2b98-189f54957980-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

> > I just hit an nvmf target NULL pointer deref BUG after a few hours of
keep-alive
> > timeout testing.  It appears that nvmet_rdma_cm_handler() was called with
> > cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as
> > NULL.  But then nvmet_rdma_queue_disconnect() is called with queue == NULL
> which
> > causes the crash.
> 
> AFAICT, the only way cm_id->qp is NULL is for a scenario we didn't even
> get to allocate a queue-pair (e.g. calling rdma_create_qp). The teardown
> paths does not nullify cm_id->qp...

rdma_destroy_qp() nulls out cm_id->qp.

> 
> Are you sure that the flow is indeed DISCONNECTED event?
>

That is the only event I would expect during this test.  From
nvmet_rdma_cm_handler():

        case RDMA_CM_EVENT_ADDR_CHANGE:
        case RDMA_CM_EVENT_DISCONNECTED:
        case RDMA_CM_EVENT_TIMEWAIT_EXIT:
                nvmet_rdma_queue_disconnect(queue);
                break;

 
> > In the log, I see that the target side keep-alive fired:
> >
> > [20676.867545] eth2: link up, 40Gbps, full-duplex, Tx/Rx PAUSE
> > [20677.079669] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> > [20677.079684] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> 
> Wow, two keep-alive timeouts on the same controller? that is
> seriously wrong...
> 
> 
> > [20677.088402] nvmet_rdma: freeing queue 276
> > [20677.090981] BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000120
> > [20677.090988] IP: [<ffffffffa084b6b4>]
> nvmet_rdma_queue_disconnect+0x24/0x90
> > [nvmet_rdma]
> 
> No stack trace?
> 

Sure:

[20677.090992] Oops: 0000 [#1] SMP
[20677.091030] Modules linked in: iw_cxgb4(OE) cxgb4(OE) null_blk brd nvmet_rdma
nvmet ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM
iptable_mangle iptable_filter ip_tables bridge 8021q mrp garp stp llc
ipmi_devintf cachefiles fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs
ib_umad ocrdma be2net iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx5_ib
mlx5_core mlx4_ib mlx4_en mlx4_core ib_mthca ib_core binfmt_misc dm_mirror
dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm irqbypass uinput
iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr dm_mod i2c_i801 i2c_smbus sg ipmi_si
ipmi_msghandler nvme nvme_core lpc_ich mfd_core
[20677.091040]  mei_me mei igb dca ptp pps_core wmi ext4(E) mbcache(E) jbd2(E)
sd_mod(E) ahci(E) libahci(E) libata(E) mgag200(E) ttm(E) drm_kms_helper(E)
drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E) syscopyarea(E) i2c_algo_bit(E)
i2c_core(E) [last unloaded: cxgb4]
[20677.091043] CPU: 10 PID: 24331 Comm: kworker/u64:1 Tainted: G           OE
4.8.0 #130
[20677.091044] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
[20677.091049] Workqueue: iw_cm_wq cm_work_handler [iw_cm]
[20677.091050] task: ffff88102fa4ab80 task.stack: ffff880f111d8000
[20677.091053] RIP: 0010:[<ffffffffa084b6b4>]  [<ffffffffa084b6b4>]
nvmet_rdma_queue_disconnect+0x24/0x90 [nvmet_rdma]
[20677.091054] RSP: 0018:ffff880f111db968  EFLAGS: 00010296
[20677.091055] RAX: ffff88102fa4ab80 RBX: 0000000000000000 RCX: 0000000000000000
[20677.091056] RDX: 0000000000000005 RSI: 000000000000000a RDI: ffffffffa084f040
[20677.091056] RBP: ffff880f111db998 R08: ffff88102a30e358 R09: ffff880f111dba78
[20677.091058] R10: 0000000000000c2c R11: 0000000000000001 R12: 0000000000000000
[20677.091059] R13: ffff880805ca2c00 R14: ffff880f111db9e8 R15: ffff880f111dbc48
[20677.091060] FS:  0000000000000000(0000) GS:ffff88105f280000(0000)
knlGS:0000000000000000
[20677.091061] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20677.091062] CR2: 0000000000000120 CR3: 000000105a203000 CR4: 00000000000406e0
[20677.091063] Stack:
[20677.091065]  ffff881053f6c100 0000000000000046 000012ce42a8b7b3
ffff88105bdf7000
[20677.091067]  ffff880f111db9e8 0000000000000000 ffff880f111db9c8
ffffffffa084d484
[20677.091069]  ffff880f111dbb28 ffff880805ca2c00 ffff880805ca2e70
ffff880f111db9e8
[20677.091070] Call Trace:
[20677.091074]  [<ffffffffa084d484>] nvmet_rdma_cm_handler+0x154/0x1a0
[nvmet_rdma]
[20677.091078]  [<ffffffffa0737e90>] cma_iw_handler+0x110/0x1c0 [rdma_cm]
[20677.091084]  [<ffffffff8115575e>] ? ring_buffer_lock_reserve+0x7e/0xc0
[20677.091086]  [<ffffffffa07271b3>] cm_close_handler+0x93/0xc0 [iw_cm]
[20677.091089]  [<ffffffffa0729027>] process_event+0xd7/0xf0 [iw_cm]
[20677.091092]  [<ffffffffa0729170>] cm_work_handler+0x130/0x190 [iw_cm]
[20677.091097]  [<ffffffff8109d196>] ?
trace_event_raw_event_workqueue_execute_start+0x66/0xa0
[20677.091099]  [<ffffffff810a1483>] process_one_work+0x183/0x4d0
[20677.091103]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[20677.091106]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[20677.091108]  [<ffffffff810a211d>] worker_thread+0x16d/0x530
[20677.091110]  [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[20677.091115]  [<ffffffff810cb9b6>] ? __wake_up_common+0x56/0x90
[20677.091117]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[20677.091119]  [<ffffffff816df260>] ? schedule+0x40/0xb0
[20677.091121]  [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[20677.091123]  [<ffffffff810a6c5c>] kthread+0xcc/0xf0
[20677.091126]  [<ffffffff810b1aae>] ? schedule_tail+0x1e/0xc0
[20677.091129]  [<ffffffff816e2eff>] ret_from_fork+0x1f/0x40
[20677.091131]  [<ffffffff810a6b90>] ? kthread_freezable_should_stop+0x70/0x70
[20677.091149] Code: 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 30 48 89 5d f0 4c
89 65 f8 66 66 66 66 90 48 89 fb 48 c7 c7 40 f0 84 a0 e8 1c 55 e9 e0 <48> 8b 83
20 01 00 00 4c 8d a3 20 01 00 00 49 39 c4 48 89 45 e0
[20677.091152] RIP  [<ffffffffa084b6b4>] nvmet_rdma_queue_disconnect+0x24/0x90
[nvmet_rdma]
[20677.091152]  RSP <ffff880f111db968>
[20677.091153] CR2: 0000000000000120

> >
> >
> > So maybe there is just a race in that keep-alive can free the queue and yet
a
> > DISCONNECTED event still received on the cm_id after the queue is freed?
> 
> rdma_destroy_id should barrier this scenario.

I'm guessing the cm_id hasn't been destroyed.  But the qp has...


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: nvmet_rdma crash - DISCONNECT event with NULL queue
From: Sagi Grimberg @ 2016-11-01 16:15 UTC (permalink / raw)
  To: Steve Wise, Christoph Hellwig
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <01b401d23458$af277210$0d765630$@opengridcomputing.com>


> Hey guys,
>
> I just hit an nvmf target NULL pointer deref BUG after a few hours of keep-alive
> timeout testing.  It appears that nvmet_rdma_cm_handler() was called with
> cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as
> NULL.  But then nvmet_rdma_queue_disconnect() is called with queue == NULL which
> causes the crash.

AFAICT, the only way cm_id->qp is NULL is for a scenario we didn't even
get to allocate a queue-pair (e.g. calling rdma_create_qp). The teardown
paths does not nullify cm_id->qp...

Are you sure that the flow is indeed DISCONNECTED event?

> In the log, I see that the target side keep-alive fired:
>
> [20676.867545] eth2: link up, 40Gbps, full-duplex, Tx/Rx PAUSE
> [20677.079669] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> [20677.079684] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!

Wow, two keep-alive timeouts on the same controller? that is
seriously wrong...


> [20677.088402] nvmet_rdma: freeing queue 276
> [20677.090981] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000120
> [20677.090988] IP: [<ffffffffa084b6b4>] nvmet_rdma_queue_disconnect+0x24/0x90
> [nvmet_rdma]

No stack trace?

>
>
> So maybe there is just a race in that keep-alive can free the queue and yet a
> DISCONNECTED event still received on the cm_id after the queue is freed?

rdma_destroy_id should barrier this scenario.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 12/14] SRP transport, scsi-mq: Wait for .queue_rq() if necessary
From: Martin K. Petersen @ 2016-11-01 16:12 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Ming Lei, Konrad Rzeszutek Wilk, Roger Pau Monné,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <32accadb-3ed3-cd5d-e92f-655f8f3cb9bc@sandisk.com>

>>>>> "Bart" == Bart Van Assche <bart.vanassche@sandisk.com> writes:

Bart> Ensure that if scsi-mq is enabled that
Bart> scsi_internal_device_block() waits until ongoing
Bart> shost->hostt->queuecommand() calls have finished.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH v5 11/14] SRP transport: Move queuecommand() wait code to SCSI core
From: Martin K. Petersen @ 2016-11-01 16:11 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Ming Lei, Konrad Rzeszutek Wilk, Roger Pau Monné,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <b447e9bc-7e29-d058-76ad-89ba2a7bf729-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

>>>>> "Bart" == Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> writes:

Bart> Additionally, rename srp_wait_for_queuecommand() into
Bart> scsi_wait_for_queuecommand() and add a comment about the
Bart> queuecommand() call from scsi_send_eh_cmnd().

Reviewed-by: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

-- 
Martin K. Petersen	Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 13/14] nvme: Fix a race condition related to stopping queues
From: Sagi Grimberg @ 2016-11-01 16:03 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <3a8cc957-ea9e-fe1d-f325-ad9ca1aae8a8@sandisk.com>

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply

* Re: [PATCH v5 12/14] SRP transport, scsi-mq: Wait for .queue_rq() if necessary
From: Sagi Grimberg @ 2016-11-01 16:03 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <32accadb-3ed3-cd5d-e92f-655f8f3cb9bc-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

and again,

Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 11/14] SRP transport: Move queuecommand() wait code to SCSI core
From: Sagi Grimberg @ 2016-11-01 16:03 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <b447e9bc-7e29-d058-76ad-89ba2a7bf729@sandisk.com>

Again,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply

* Re: [PATCH v5 08/14] blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request()
From: Sagi Grimberg @ 2016-11-01 16:02 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <01d525ec-5b5b-b22d-4b17-36d7406920ed@sandisk.com>

Looks useful,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply

* Re: [PATCH v5 07/14] blk-mq: Introduce blk_mq_quiesce_queue()
From: Sagi Grimberg @ 2016-11-01 16:02 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <81a7e98c-51b0-99f0-c143-303425508bff@sandisk.com>

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply

* Re: [PATCH v5 06/14] blk-mq: Remove blk_mq_cancel_requeue_work()
From: Sagi Grimberg @ 2016-11-01 16:01 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org
In-Reply-To: <7207b8ad-aec9-b7d8-e36b-8c360bf32f06@sandisk.com>

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply

* Re: [PATCH v5 05/14] blk-mq: Avoid that requeueing starts stopped queues
From: Sagi Grimberg @ 2016-11-01 16:01 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <8d5b97cb-429a-e451-ad2e-ad63a3f94dfd-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* nvmet_rdma crash - DISCONNECT event with NULL queue
From: Steve Wise @ 2016-11-01 15:57 UTC (permalink / raw)
  To: sagig, Christoph Hellwig
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hey guys,

I just hit an nvmf target NULL pointer deref BUG after a few hours of keep-alive
timeout testing.  It appears that nvmet_rdma_cm_handler() was called with
cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as
NULL.  But then nvmet_rdma_queue_disconnect() is called with queue == NULL which
causes the crash.

In the log, I see that the target side keep-alive fired:

[20676.867545] eth2: link up, 40Gbps, full-duplex, Tx/Rx PAUSE
[20677.079669] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
[20677.079684] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!

Then all the queues are freed followed by the crash.

[20677.080066] nvmet_rdma: freeing queue 222
[20677.080074] nvmet_rdma: sending cmd response failed
[20677.080351] nvmet_rdma: freeing queue 227
[20677.080775] nvmet_rdma: freeing queue 230
[20677.081137] nvmet_rdma: freeing queue 232
[20677.081371] nvmet_rdma: freeing queue 234
[20677.081604] nvmet_rdma: freeing queue 236
[20677.081835] nvmet_rdma: freeing queue 237
[20677.082062] nvmet_rdma: freeing queue 238
[20677.082106] nvmet_rdma: freeing queue 239
[20677.082366] nvmet_rdma: freeing queue 240
[20677.082570] nvmet_rdma: freeing queue 241
[20677.082995] nvmet_rdma: freeing queue 242
[20677.083222] nvmet_rdma: freeing queue 243
[20677.083475] nvmet_rdma: freeing queue 244
[20677.083522] nvmet_rdma: freeing queue 245
[20677.083801] nvmet_rdma: freeing queue 246
[20677.084264] nvmet_rdma: freeing queue 247
[20677.084307] nvmet_rdma: freeing queue 248
[20677.084501] nvmet_rdma: freeing queue 249
[20677.084846] nvmet_rdma: freeing queue 250
[20677.085184] nvmet_rdma: freeing queue 252
[20677.085500] nvmet_rdma: freeing queue 254
[20677.085733] nvmet_rdma: freeing queue 256
[20677.085997] nvmet_rdma: freeing queue 258
[20677.086224] nvmet_rdma: freeing queue 260
[20677.086517] nvmet_rdma: freeing queue 262
[20677.086768] nvmet_rdma: freeing queue 264
[20677.087031] nvmet_rdma: freeing queue 266
[20677.087359] nvmet_rdma: freeing queue 268
[20677.087567] nvmet_rdma: freeing queue 270
[20677.087821] nvmet_rdma: freeing queue 272
[20677.088162] nvmet_rdma: freeing queue 274
[20677.088402] nvmet_rdma: freeing queue 276
[20677.090981] BUG: unable to handle kernel NULL pointer dereference at
0000000000000120
[20677.090988] IP: [<ffffffffa084b6b4>] nvmet_rdma_queue_disconnect+0x24/0x90
[nvmet_rdma]


So maybe there is just a race in that keep-alive can free the queue and yet a
DISCONNECTED event still received on the cm_id after the queue is freed?

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH] qedr: Fix possible memory leak in qedr_create_qp()
From: Amrani, Ram @ 2016-11-01 14:42 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Wei Yongjun, Doug Ledford, Borundia, Rajesh, Wei Yongjun,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161031053318.GU3617-2ukJVAZIZ/Y@public.gmane.org>

> While looking on this patch and associated code to it, I noticed the 
> following code stack:
> 
> qedr_create_qp
> 	-->
> 	dev->ops->rdma_destroy_qp(dev->rdma_ctx, qp->qed_qp);
> 		-->
> 		qed_rdma_destroy_qp
> 			-->
> 			qed_roce_destroy_qp
> This function will check the QP state and return -INVAL and comment 
> that this QP needs to be prepared before destroying it.
> 
> However immediately after returning, you are calling to kfree(qp) 
> without any checks.
> 
> It looks like an error and it is worth to take a look on it.
>

That's a deep  level of reading... thanks.

When the QP is created its state is set in ecore_rdma_create_qp():
	qp->cur_state = ECORE_ROCE_QP_STATE_RESET;

When it is ecore_roce_destroy_qp() is invoked, the function *must* be in either RESET or two other states:
	if ((qp->cur_state != QED_ROCE_QP_STATE_RESET) &&
	    (qp->cur_state != QED_ROCE_QP_STATE_ERR) &&
	    (qp->cur_state != QED_ROCE_QP_STATE_INIT)) {
		DP_NOTICE(p_hwfn,
			  "QP must be in error, reset or init state before destroying it\n");
		return -EINVAL;
	}
So actually, we won't return -INVAL here.

The bug I see is that I see in our upstream code is that for RESET the normal "destroy" operations continue. But they shouldn't.
We need here something like this:
	if (qp->cur_state == ECORE_ROCE_QP_STATE_RESET)
		return 0;

Flow will return to qed_rdma_destroy_qp() that will release the qp resource in the qed_roce scope (our real purpose).
And then return to qedr_create_qp() where the qp resource will be released in the qedr scope.

And as a side issue - an improvement that can be added is to return the error code of the QP create and not of the QP destroy.

I'll the first later on and probably the second too.



> And did I miss the fix to memory leak posted during code review?
> 
As far as I know, I have supplied patches for all memory leaks. Can you direct me to a specific e-mail?

Thanks,
Ram

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc] IB/ipoib: print only once when doesn't support IB_QP_CREATE_USE_GFP_NOIO
From: Yuval Shaia @ 2016-11-01 14:34 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Erez Shitrit
In-Reply-To: <20161101141454.GD3617-2ukJVAZIZ/Y@public.gmane.org>

On Tue, Nov 01, 2016 at 04:14:54PM +0200, Leon Romanovsky wrote:
> On Tue, Nov 01, 2016 at 03:48:24PM +0200, Yuval Shaia wrote:
> > On Tue, Nov 01, 2016 at 02:34:13PM +0200, Leon Romanovsky wrote:
> > > From: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >
> > > Currently when the card doesn't support IB_QP_CREATE_USE_GFP_NOIO it warns
> > > on every QP creation, It becomes worse when driver works in connected mode
> > > we will see one print on each new connection, instead do it once.
> > >
> > > Fixes: 09b93088d7 ('Add a QP creation flag to use GFP_NOIO allocations')
> > > Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > > ---
> > >  drivers/infiniband/ulp/ipoib/ipoib_cm.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > index 4ad297d..917393b 100644
> > > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > @@ -1053,8 +1053,8 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_
> > >
> > >  	tx_qp = ib_create_qp(priv->pd, &attr);
> > >  	if (PTR_ERR(tx_qp) == -EINVAL) {
> > > -		ipoib_warn(priv, "can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
> > > -			   priv->ca->name);
> > > +		pr_warn_once("can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
> > > +			     priv->ca->name);
> >
> > But it will still re-print it for different device, right?
> 
> Good question,
> 
> pr_warn_once is defined as alias to printk_once [1]. That printk_once is
> macro too [2] which will define local static read_once variable.
> 
> [1] http://lxr.free-electrons.com/source/include/linux/printk.h#L359
> [2] http://lxr.free-electrons.com/source/include/linux/printk.h#L322

If only one HCA model is installed on the system then it should be fine,
but wonder if more then one, would we like to see the warning again? i think
yes.

> 
> >
> > >  		attr.create_flags &= ~IB_QP_CREATE_USE_GFP_NOIO;
> > >  		tx_qp = ib_create_qp(priv->pd, &attr);
> > >  	}
> > > --
> > > 2.7.4
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH v2 perftest] Support for Chelsio T6 devices
From: Steve Wise @ 2016-11-01 14:16 UTC (permalink / raw)
  To: 'Leon Romanovsky'
  Cc: 'Gil Rockah', 'Zohar Ben Aharon',
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161031231124.GA3617-2ukJVAZIZ/Y@public.gmane.org>

> Hi Steve,
> I merged both of your patches
> https://patchwork.kernel.org/patch/9341763/
> https://patchwork.kernel.org/patch/9219151/
> https://github.com/linux-rdma/perftest/pull/1
> 
> Sorry for the delay.

Thanks all!

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc] IB/ipoib: print only once when doesn't support IB_QP_CREATE_USE_GFP_NOIO
From: Leon Romanovsky @ 2016-11-01 14:14 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Erez Shitrit
In-Reply-To: <20161101134823.GA6681-Hxa29pjIrERMGUUWBy6pNA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2176 bytes --]

On Tue, Nov 01, 2016 at 03:48:24PM +0200, Yuval Shaia wrote:
> On Tue, Nov 01, 2016 at 02:34:13PM +0200, Leon Romanovsky wrote:
> > From: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >
> > Currently when the card doesn't support IB_QP_CREATE_USE_GFP_NOIO it warns
> > on every QP creation, It becomes worse when driver works in connected mode
> > we will see one print on each new connection, instead do it once.
> >
> > Fixes: 09b93088d7 ('Add a QP creation flag to use GFP_NOIO allocations')
> > Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > ---
> >  drivers/infiniband/ulp/ipoib/ipoib_cm.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > index 4ad297d..917393b 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > @@ -1053,8 +1053,8 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_
> >
> >  	tx_qp = ib_create_qp(priv->pd, &attr);
> >  	if (PTR_ERR(tx_qp) == -EINVAL) {
> > -		ipoib_warn(priv, "can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
> > -			   priv->ca->name);
> > +		pr_warn_once("can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
> > +			     priv->ca->name);
>
> But it will still re-print it for different device, right?

Good question,

pr_warn_once is defined as alias to printk_once [1]. That printk_once is
macro too [2] which will define local static read_once variable.

[1] http://lxr.free-electrons.com/source/include/linux/printk.h#L359
[2] http://lxr.free-electrons.com/source/include/linux/printk.h#L322

>
> >  		attr.create_flags &= ~IB_QP_CREATE_USE_GFP_NOIO;
> >  		tx_qp = ib_create_qp(priv->pd, &attr);
> >  	}
> > --
> > 2.7.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* RE: [PATCH] qedr: Fix missing unlock on error in qedr_post_send()
From: Amrani, Ram @ 2016-11-01 14:09 UTC (permalink / raw)
  To: Leon Romanovsky, Wei Yongjun
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, Borundia, Rajesh,
	Wei Yongjun, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161031053835.GV3617-2ukJVAZIZ/Y@public.gmane.org>

> > index a615142..e7c7417 100644
> > --- a/drivers/infiniband/hw/qedr/verbs.c
> > +++ b/drivers/infiniband/hw/qedr/verbs.c
> > @@ -2983,7 +2983,8 @@ int qedr_post_send(struct ib_qp *ibqp, struct
> > ib_send_wr *wr,
> >
> >  	if (!wr) {
> >  		DP_ERR(dev, "Got an empty post send.\n");
> > -		return -EINVAL;
> > +		rc = -EINVAL;
> > +		goto out_unlock;
> >  	}
> 
> IMHO, this if needs to be moved to be before acquiring spinlock and avoid
> introducing new labels for this one case only.
> 

Thanks Wei and Leon.

Actually, perhaps we can totally remove the check itself -
Since this is kernel space, is it safe to presume that all ULPs are trusted to be well coded?
(if not, and as a side note, I see that in MLX4 there's a for(..;wr;..) loop, but that wr is 
dereferenced earlier so perhaps this is a potential bug)

Ram


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCHv12 0/3] rdmacg: IB/core: rdma controller support
From: Leon Romanovsky @ 2016-11-01 14:07 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-rdma, Li Zefan,
	Johannes Weiner, Doug Ledford, Christoph Hellwig, Liran Liss,
	Hefty, Sean, Jason Gunthorpe, Haggai Eran,
	james.l.morris-QHcLZuEGTsvQT0dZR+AlfA, Or Gerlitz, Matan Barak
In-Reply-To: <CAG53R5VKwntDHX101+5aaGoyKMKQuiKQWam575iFAxhmKxhE1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 582 bytes --]

On Tue, Nov 01, 2016 at 04:33:23PM +0530, Parav Pandit wrote:
>
> > 4. Cgroup configuration should be as close as possible to "standard" if
> > such exists, so all infinite internet guides will work for RDMA too.
> I didnt follow this comment. Can you please explain? Are you saying
> rdma cgroup should have define all the objects of IB spec?

It is not related to spec at all. There were comments from Tejun and you that
other cgroups (CPU, ...) have different semantics and RDMA has something unique
(I don't remember what was it). I want to see minimal uniqueness RDMA cgroups.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH rdma-core] qede: fix general protection fault may occur on probe
From: Leon Romanovsky @ 2016-11-01 14:01 UTC (permalink / raw)
  To: Amrani, Ram
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Elior, Ariel,
	Kalderon, Michal, Mintz, Yuval,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <SN1PR07MB2207881FAE3C35859A8EDED5F8A10-mikhvbZlbf8TSoR2DauN2+FPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 554 bytes --]

On Tue, Nov 01, 2016 at 01:34:23PM +0000, Amrani, Ram wrote:
> > We use "rdma-core" notations for patches intended to consolidated library,
> > while your patch is for the kernel.
>
> ACK. My bad. Shall I re-send it?

No, there is no need.
The information from [..] is stripped from the actual applied patch.

Thanks

>
> Ram
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH rdma-rc] IB/ipoib: print only once when doesn't support IB_QP_CREATE_USE_GFP_NOIO
From: Yuval Shaia @ 2016-11-01 13:48 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Erez Shitrit
In-Reply-To: <1478003653-16248-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Tue, Nov 01, 2016 at 02:34:13PM +0200, Leon Romanovsky wrote:
> From: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Currently when the card doesn't support IB_QP_CREATE_USE_GFP_NOIO it warns
> on every QP creation, It becomes worse when driver works in connected mode
> we will see one print on each new connection, instead do it once.
> 
> Fixes: 09b93088d7 ('Add a QP creation flag to use GFP_NOIO allocations')
> Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_cm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> index 4ad297d..917393b 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -1053,8 +1053,8 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_
>  
>  	tx_qp = ib_create_qp(priv->pd, &attr);
>  	if (PTR_ERR(tx_qp) == -EINVAL) {
> -		ipoib_warn(priv, "can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
> -			   priv->ca->name);
> +		pr_warn_once("can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
> +			     priv->ca->name);

But it will still re-print it for different device, right?

>  		attr.create_flags &= ~IB_QP_CREATE_USE_GFP_NOIO;
>  		tx_qp = ib_create_qp(priv->pd, &attr);
>  	}
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH rdma-core] qede: fix general protection fault may occur on probe
From: Amrani, Ram @ 2016-11-01 13:34 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Elior, Ariel,
	Kalderon, Michal, Mintz, Yuval,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161030205026.GR3617-2ukJVAZIZ/Y@public.gmane.org>

> We use "rdma-core" notations for patches intended to consolidated library,
> while your patch is for the kernel.

ACK. My bad. Shall I re-send it?

Ram

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH rdma-rc] IB/ipoib: print only once when doesn't support IB_QP_CREATE_USE_GFP_NOIO
From: Leon Romanovsky @ 2016-11-01 12:34 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Erez Shitrit

From: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Currently when the card doesn't support IB_QP_CREATE_USE_GFP_NOIO it warns
on every QP creation, It becomes worse when driver works in connected mode
we will see one print on each new connection, instead do it once.

Fixes: 09b93088d7 ('Add a QP creation flag to use GFP_NOIO allocations')
Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib_cm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 4ad297d..917393b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1053,8 +1053,8 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_
 
 	tx_qp = ib_create_qp(priv->pd, &attr);
 	if (PTR_ERR(tx_qp) == -EINVAL) {
-		ipoib_warn(priv, "can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
-			   priv->ca->name);
+		pr_warn_once("can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
+			     priv->ca->name);
 		attr.create_flags &= ~IB_QP_CREATE_USE_GFP_NOIO;
 		tx_qp = ib_create_qp(priv->pd, &attr);
 	}
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCHv12 0/3] rdmacg: IB/core: rdma controller support
From: Parav Pandit @ 2016-11-01 11:03 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-rdma, Li Zefan,
	Johannes Weiner, Doug Ledford, Christoph Hellwig, Liran Liss,
	Hefty, Sean, Jason Gunthorpe, Haggai Eran,
	james.l.morris-QHcLZuEGTsvQT0dZR+AlfA, Or Gerlitz, Matan Barak
In-Reply-To: <20161031065441.GY3617-2ukJVAZIZ/Y@public.gmane.org>

On Mon, Oct 31, 2016 at 12:24 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Thu, Oct 20, 2016 at 01:48:27AM +0530, Parav Pandit wrote:
>> On Thu, Oct 20, 2016 at 1:35 AM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> > Hello, Parav.
>> >
>> > On Thu, Oct 20, 2016 at 01:24:42AM +0530, Parav Pandit wrote:
>> >> userland can get the max numbers using other framework which is used
>> >> by control & data plane available in C library form or in form of
>> >> system tools.
>> >> I was preferring to get and set through same interface because,
>> >> It simplifies user land software which is often not written in C so
>> >> its likely that it needs to rely on system tools and parse the
>> >> content, iterate through devices etc.
>> >> Getting these info through rdma.max just makes it simple. There will
>> >> be logic built to read/write rdma.max in userland anyway, which can be
>> >> leveraged for percentage calculation instead of doing it from two
>> >> places.
>> >
>> > Yeah, I get that this can be convenient in this case but it isn't a
>> > generic approach.  I'd much prefer keeping it in line with other
>> > resources.
>> >
>> Hmm. we don't have /proc/sys/kernel/pid_max type of simple interface
>> to get the max values for rdma resources.
>> rdma.max is close to that simplicity.
>
> Sorry for my late response (very long weekends and piles of mails after it) and
> for not clarifying our requirements better, which are very simple.
>
> 1. We will have vendor specific vendors objects in the future (new ABI
> support it and designed for that).
I will let others comments on it. The patch_v11 design was allowing
vendor specific objects and standard objects to be defined in IB core
and rdma cgroup was facilitator to do so. We didn't reach consensus on
that approach.

> 2. We don't want to fight for every addition of such objects to cgroup list.
Ditto comment as above.

> 3. We don't want to teach and/or rewrite scripts for "average" user after
> addition of new objects.
This we can possibly do by having new rdma.percentage knob, which gets
configured by default for every new object in rdma cgroup.
This way average user/administrator doesn't have to know about it.

> 4. Cgroup configuration should be as close as possible to "standard" if
> such exists, so all infinite internet guides will work for RDMA too.
I didnt follow this comment. Can you please explain? Are you saying
rdma cgroup should have define all the objects of IB spec?
>
> From my understanding of current status.
> My naive approach of introducing GLOBAL_HCA object is the way to go and the real question
> is to understand how to configure it, am I right?
>
Global object won't work for below reason.
Lets take example that makes life easier.
Lets say two new RDMA objects exist which are not part of rdma cgroup
standard resource definition.
say, indirection table and PSM tags.
Both are abstracted using one global_hca resource object.
Say its given 10%.
Now IB core performs charging of each such object using GLOBAL_HCA.
(Because cgroup level there is only one object GLOBAL_HCA).
So two or more resources are mapped to single object.
Which means, one object can be charged more with total limit still
under 10%, thats leads to same problem as not having cgroup at all.

So my opinion is:
(a) Let cgroup define the current standard objects and new reasonable
set of vendor specific objects in future.
(b) Add new rdma.percentage parameter so that any new standard object
or vendor specific object can be abstracted from average end user and
applications which are yet to catch up.
I believe this takes care of your point (1), (3), (4)?

In other hypothetical design,
we can have rdma group as just pid to cgroup mapping facilitator.
All the charging/uncharging logic moves to IB core in form of library,
that standard ABI uverbs and vendor specific layer invokes. In this
approach there will be code duplicated in every such vendor driver.
By doing so, more callbacks will also have to be moved down till IB
core and vendor drivers for cgroup creation/deletion/offline etc.

This also means that lack of standard object definitions, may creates
more confusion to end user and orchestration applications. I prefer to
avoid such design.

Parav

^ permalink raw reply

* RE: [PATCH] qedr: Fix possible memory leak in qedr_create_qp()
From: Amrani, Ram @ 2016-11-01 10:38 UTC (permalink / raw)
  To: Wei Yongjun, Doug Ledford, Sean Hefty, Hal Rosenstock,
	Borundia, Rajesh
  Cc: Wei Yongjun, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477672427-31575-1-git-send-email-weiyj.lk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

> 'qp' is malloced in qedr_create_qp() and should be freed before leaving from the
> error handling cases, otherwise it will cause memory leak.
> 
> Signed-off-by: Wei Yongjun <weiyongjun1-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/infiniband/hw/qedr/verbs.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qedr/verbs.c
> b/drivers/infiniband/hw/qedr/verbs.c
> index a615142..b60f145 100644
> --- a/drivers/infiniband/hw/qedr/verbs.c
> +++ b/drivers/infiniband/hw/qedr/verbs.c
> @@ -1477,6 +1477,7 @@ struct ib_qp *qedr_create_qp(struct ib_pd *ibpd,
>  	struct qedr_ucontext *ctx = NULL;
>  	struct qedr_create_qp_ureq ureq;
>  	struct qedr_qp *qp;
> +	struct ib_qp *ibqp;
>  	int rc = 0;
> 
>  	DP_DEBUG(dev, QEDR_MSG_QP, "create qp: called from %s, pd=%p\n",
> @@ -1486,13 +1487,13 @@ struct ib_qp *qedr_create_qp(struct ib_pd *ibpd,
>  	if (rc)
>  		return ERR_PTR(rc);
> 
> +	if (attrs->srq)
> +		return ERR_PTR(-EINVAL);
> +
>  	qp = kzalloc(sizeof(*qp), GFP_KERNEL);
>  	if (!qp)
>  		return ERR_PTR(-ENOMEM);
> 
> -	if (attrs->srq)
> -		return ERR_PTR(-EINVAL);
> -
>  	DP_DEBUG(dev, QEDR_MSG_QP,
>  		 "create qp: sq_cq=%p, sq_icid=%d, rq_cq=%p, rq_icid=%d\n",
>  		 get_qedr_cq(attrs->send_cq),
> @@ -1508,7 +1509,10 @@ struct ib_qp *qedr_create_qp(struct ib_pd *ibpd,
>  			       "create qp: unexpected udata when creating GSI
> QP\n");
>  			goto err0;
>  		}
> -		return qedr_create_gsi_qp(dev, attrs, qp);
> +		ibqp = qedr_create_gsi_qp(dev, attrs, qp);
> +		if (IS_ERR(ibqp))
> +			kfree(qp);
> +		return ibqp;
>  	}
> 
>  	memset(&in_params, 0, sizeof(in_params));

Thanks again

Acked-by: Ram Amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox