public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] nvmet,rxe: defer ip datagram sending to tasklet
@ 2018-05-07 20:58 Alexandru Moise
  2018-05-08  3:09 ` Yanjun Zhu
  0 siblings, 1 reply; 3+ messages in thread
From: Alexandru Moise @ 2018-05-07 20:58 UTC (permalink / raw)
  To: monis, dledford, jgg, linux-rdma, linux-kernel, yanjun.zhu

This addresses 3 separate problems:

1. When using NVME over Fabrics we may end up sending IP
packets in interrupt context, we should defer this work
to a tasklet.

[   50.939957] WARNING: CPU: 3 PID: 0 at kernel/softirq.c:161 __local_bh_enable_ip+0x1f/0xa0
[   50.942602] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G        W         4.17.0-rc3-ARCH+ #104
[   50.945466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[   50.948163] RIP: 0010:__local_bh_enable_ip+0x1f/0xa0
[   50.949631] RSP: 0018:ffff88009c183900 EFLAGS: 00010006
[   50.951029] RAX: 0000000080010403 RBX: 0000000000000200 RCX: 0000000000000001
[   50.952636] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffff817e04ec
[   50.954278] RBP: ffff88009c183910 R08: 0000000000000001 R09: 0000000000000614
[   50.956000] R10: ffffea00021d5500 R11: 0000000000000001 R12: ffffffff817e04ec
[   50.957779] R13: 0000000000000000 R14: ffff88009566f400 R15: ffff8800956c7000
[   50.959402] FS:  0000000000000000(0000) GS:ffff88009c180000(0000) knlGS:0000000000000000
[   50.961552] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   50.963798] CR2: 000055c4ec0ccac0 CR3: 0000000002209001 CR4: 00000000000606e0
[   50.966121] Call Trace:
[   50.966845]  <IRQ>
[   50.967497]  __dev_queue_xmit+0x62d/0x690
[   50.968722]  dev_queue_xmit+0x10/0x20
[   50.969894]  neigh_resolve_output+0x173/0x190
[   50.971244]  ip_finish_output2+0x2b8/0x370
[   50.972527]  ip_finish_output+0x1d2/0x220
[   50.973785]  ? ip_finish_output+0x1d2/0x220
[   50.975010]  ip_output+0xd4/0x100
[   50.975903]  ip_local_out+0x3b/0x50
[   50.976823]  rxe_send+0x74/0x120
[   50.977702]  rxe_requester+0xe3b/0x10b0
[   50.978881]  ? ip_local_deliver_finish+0xd1/0xe0
[   50.980260]  rxe_do_task+0x85/0x100
[   50.981386]  rxe_run_task+0x2f/0x40
[   50.982470]  rxe_post_send+0x51a/0x550
[   50.983591]  nvmet_rdma_queue_response+0x10a/0x170
[   50.985024]  __nvmet_req_complete+0x95/0xa0
[   50.986287]  nvmet_req_complete+0x15/0x60
[   50.987469]  nvmet_bio_done+0x2d/0x40
[   50.988564]  bio_endio+0x12c/0x140
[   50.989654]  blk_update_request+0x185/0x2a0
[   50.990947]  blk_mq_end_request+0x1e/0x80
[   50.991997]  nvme_complete_rq+0x1cc/0x1e0
[   50.993171]  nvme_pci_complete_rq+0x117/0x120
[   50.994355]  __blk_mq_complete_request+0x15e/0x180
[   50.995988]  blk_mq_complete_request+0x6f/0xa0
[   50.997304]  nvme_process_cq+0xe0/0x1b0
[   50.998494]  nvme_irq+0x28/0x50
[   50.999572]  __handle_irq_event_percpu+0xa2/0x1c0
[   51.000986]  handle_irq_event_percpu+0x32/0x80
[   51.002356]  handle_irq_event+0x3c/0x60
[   51.003463]  handle_edge_irq+0x1c9/0x200
[   51.004473]  handle_irq+0x23/0x30
[   51.005363]  do_IRQ+0x46/0xd0
[   51.006182]  common_interrupt+0xf/0xf
[   51.007129]  </IRQ>

2. Work must always be offloaded to tasklet when using NVMEoF in order to
solve lock ordering between neigh->ha_lock seqlock and the nvme queue lock:

[   77.833783]  Possible interrupt unsafe locking scenario:
[   77.833783]
[   77.835831]        CPU0                    CPU1
[   77.837129]        ----                    ----
[   77.838313]   lock(&(&n->ha_lock)->seqcount);
[   77.839550]                                local_irq_disable();
[   77.841377]                                lock(&(&nvmeq->q_lock)->rlock);
[   77.843222]                                lock(&(&n->ha_lock)->seqcount);
[   77.845178]   <Interrupt>
[   77.846298]     lock(&(&nvmeq->q_lock)->rlock);
[   77.847986]
[   77.847986]  *** DEADLOCK ***

3. Same goes for the lock ordering between sch->q.lock and nvme queue lock:

[   47.634271]  Possible interrupt unsafe locking scenario:
[   47.634271]
[   47.636452]        CPU0                    CPU1
[   47.637861]        ----                    ----
[   47.639285]   lock(&(&sch->q.lock)->rlock);
[   47.640654]                                local_irq_disable();
[   47.642451]                                lock(&(&nvmeq->q_lock)->rlock);
[   47.644521]                                lock(&(&sch->q.lock)->rlock);
[   47.646480]   <Interrupt>
[   47.647263]     lock(&(&nvmeq->q_lock)->rlock);
[   47.648492]
[   47.648492]  *** DEADLOCK ***

Using NVMEoF after this patch seems to finally be stable, without it,
rxe eventually deadlocks the whole system and causes RCU stalls.

Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
---
v2->v3: Abandoned the idea of just offloading work to tasklet in irq
context. NVMEoF needs the work to always be offloaded in order to
function. I am unsure how large an impact this has on other rxe users.

 drivers/infiniband/sw/rxe/rxe_verbs.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 2cb52fd48cf1..564453060c54 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -761,7 +761,6 @@ static int rxe_post_send_kernel(struct rxe_qp *qp, struct ib_send_wr *wr,
 	unsigned int mask;
 	unsigned int length = 0;
 	int i;
-	int must_sched;
 
 	while (wr) {
 		mask = wr_opcode_mask(wr->opcode, qp);
@@ -791,14 +790,7 @@ static int rxe_post_send_kernel(struct rxe_qp *qp, struct ib_send_wr *wr,
 		wr = wr->next;
 	}
 
-	/*
-	 * Must sched in case of GSI QP because ib_send_mad() hold irq lock,
-	 * and the requester call ip_local_out_sk() that takes spin_lock_bh.
-	 */
-	must_sched = (qp_type(qp) == IB_QPT_GSI) ||
-			(queue_count(qp->sq.queue) > 1);
-
-	rxe_run_task(&qp->req.task, must_sched);
+	rxe_run_task(&qp->req.task, 1);
 	if (unlikely(qp->req.state == QP_STATE_ERROR))
 		rxe_run_task(&qp->comp.task, 1);
 
@@ -822,7 +814,7 @@ static int rxe_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 
 	if (qp->is_user) {
 		/* Utilize process context to do protocol processing */
-		rxe_run_task(&qp->req.task, 0);
+		rxe_run_task(&qp->req.task, 1);
 		return 0;
 	} else
 		return rxe_post_send_kernel(qp, wr, bad_wr);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] nvmet,rxe: defer ip datagram sending to tasklet
  2018-05-07 20:58 [PATCH v3] nvmet,rxe: defer ip datagram sending to tasklet Alexandru Moise
@ 2018-05-08  3:09 ` Yanjun Zhu
  2018-05-08  8:55   ` Alexandru Moise
  0 siblings, 1 reply; 3+ messages in thread
From: Yanjun Zhu @ 2018-05-08  3:09 UTC (permalink / raw)
  To: Alexandru Moise, monis, dledford, jgg, linux-rdma, linux-kernel



On 2018/5/8 4:58, Alexandru Moise wrote:
> This addresses 3 separate problems:
>
> 1. When using NVME over Fabrics we may end up sending IP
> packets in interrupt context, we should defer this work
> to a tasklet.
>
> [   50.939957] WARNING: CPU: 3 PID: 0 at kernel/softirq.c:161 __local_bh_enable_ip+0x1f/0xa0
> [   50.942602] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G        W         4.17.0-rc3-ARCH+ #104
> [   50.945466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
> [   50.948163] RIP: 0010:__local_bh_enable_ip+0x1f/0xa0
> [   50.949631] RSP: 0018:ffff88009c183900 EFLAGS: 00010006
> [   50.951029] RAX: 0000000080010403 RBX: 0000000000000200 RCX: 0000000000000001
> [   50.952636] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffff817e04ec
> [   50.954278] RBP: ffff88009c183910 R08: 0000000000000001 R09: 0000000000000614
> [   50.956000] R10: ffffea00021d5500 R11: 0000000000000001 R12: ffffffff817e04ec
> [   50.957779] R13: 0000000000000000 R14: ffff88009566f400 R15: ffff8800956c7000
> [   50.959402] FS:  0000000000000000(0000) GS:ffff88009c180000(0000) knlGS:0000000000000000
> [   50.961552] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   50.963798] CR2: 000055c4ec0ccac0 CR3: 0000000002209001 CR4: 00000000000606e0
> [   50.966121] Call Trace:
> [   50.966845]  <IRQ>
> [   50.967497]  __dev_queue_xmit+0x62d/0x690
> [   50.968722]  dev_queue_xmit+0x10/0x20
> [   50.969894]  neigh_resolve_output+0x173/0x190
> [   50.971244]  ip_finish_output2+0x2b8/0x370
> [   50.972527]  ip_finish_output+0x1d2/0x220
> [   50.973785]  ? ip_finish_output+0x1d2/0x220
> [   50.975010]  ip_output+0xd4/0x100
> [   50.975903]  ip_local_out+0x3b/0x50
> [   50.976823]  rxe_send+0x74/0x120
> [   50.977702]  rxe_requester+0xe3b/0x10b0
> [   50.978881]  ? ip_local_deliver_finish+0xd1/0xe0
> [   50.980260]  rxe_do_task+0x85/0x100
> [   50.981386]  rxe_run_task+0x2f/0x40
> [   50.982470]  rxe_post_send+0x51a/0x550
> [   50.983591]  nvmet_rdma_queue_response+0x10a/0x170
> [   50.985024]  __nvmet_req_complete+0x95/0xa0
> [   50.986287]  nvmet_req_complete+0x15/0x60
> [   50.987469]  nvmet_bio_done+0x2d/0x40
> [   50.988564]  bio_endio+0x12c/0x140
> [   50.989654]  blk_update_request+0x185/0x2a0
> [   50.990947]  blk_mq_end_request+0x1e/0x80
> [   50.991997]  nvme_complete_rq+0x1cc/0x1e0
> [   50.993171]  nvme_pci_complete_rq+0x117/0x120
> [   50.994355]  __blk_mq_complete_request+0x15e/0x180
> [   50.995988]  blk_mq_complete_request+0x6f/0xa0
> [   50.997304]  nvme_process_cq+0xe0/0x1b0
> [   50.998494]  nvme_irq+0x28/0x50
> [   50.999572]  __handle_irq_event_percpu+0xa2/0x1c0
> [   51.000986]  handle_irq_event_percpu+0x32/0x80
> [   51.002356]  handle_irq_event+0x3c/0x60
> [   51.003463]  handle_edge_irq+0x1c9/0x200
> [   51.004473]  handle_irq+0x23/0x30
> [   51.005363]  do_IRQ+0x46/0xd0
> [   51.006182]  common_interrupt+0xf/0xf
> [   51.007129]  </IRQ>
>
> 2. Work must always be offloaded to tasklet when using NVMEoF in order to
> solve lock ordering between neigh->ha_lock seqlock and the nvme queue lock:
>
> [   77.833783]  Possible interrupt unsafe locking scenario:
> [   77.833783]
> [   77.835831]        CPU0                    CPU1
> [   77.837129]        ----                    ----
> [   77.838313]   lock(&(&n->ha_lock)->seqcount);
> [   77.839550]                                local_irq_disable();
> [   77.841377]                                lock(&(&nvmeq->q_lock)->rlock);
> [   77.843222]                                lock(&(&n->ha_lock)->seqcount);
> [   77.845178]   <Interrupt>
> [   77.846298]     lock(&(&nvmeq->q_lock)->rlock);
> [   77.847986]
> [   77.847986]  *** DEADLOCK ***
>
> 3. Same goes for the lock ordering between sch->q.lock and nvme queue lock:
>
> [   47.634271]  Possible interrupt unsafe locking scenario:
> [   47.634271]
> [   47.636452]        CPU0                    CPU1
> [   47.637861]        ----                    ----
> [   47.639285]   lock(&(&sch->q.lock)->rlock);
> [   47.640654]                                local_irq_disable();
> [   47.642451]                                lock(&(&nvmeq->q_lock)->rlock);
> [   47.644521]                                lock(&(&sch->q.lock)->rlock);
> [   47.646480]   <Interrupt>
> [   47.647263]     lock(&(&nvmeq->q_lock)->rlock);
> [   47.648492]
> [   47.648492]  *** DEADLOCK ***
>
> Using NVMEoF after this patch seems to finally be stable, without it,
> rxe eventually deadlocks the whole system and causes RCU stalls.
>
> Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
> ---
> v2->v3: Abandoned the idea of just offloading work to tasklet in irq
> context. NVMEoF needs the work to always be offloaded in order to
> function. I am unsure how large an impact this has on other rxe users.
>
>   drivers/infiniband/sw/rxe/rxe_verbs.c | 12 ++----------
>   1 file changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
> index 2cb52fd48cf1..564453060c54 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> @@ -761,7 +761,6 @@ static int rxe_post_send_kernel(struct rxe_qp *qp, struct ib_send_wr *wr,
>   	unsigned int mask;
>   	unsigned int length = 0;
>   	int i;
> -	int must_sched;
>   
>   	while (wr) {
>   		mask = wr_opcode_mask(wr->opcode, qp);
> @@ -791,14 +790,7 @@ static int rxe_post_send_kernel(struct rxe_qp *qp, struct ib_send_wr *wr,
>   		wr = wr->next;
>   	}
>   
> -	/*
> -	 * Must sched in case of GSI QP because ib_send_mad() hold irq lock,
> -	 * and the requester call ip_local_out_sk() that takes spin_lock_bh.
> -	 */
> -	must_sched = (qp_type(qp) == IB_QPT_GSI) ||
> -			(queue_count(qp->sq.queue) > 1);
> -
> -	rxe_run_task(&qp->req.task, must_sched);
> +	rxe_run_task(&qp->req.task, 1);
>   	if (unlikely(qp->req.state == QP_STATE_ERROR))
>   		rxe_run_task(&qp->comp.task, 1);
>   
> @@ -822,7 +814,7 @@ static int rxe_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
>   
>   	if (qp->is_user) {
>   		/* Utilize process context to do protocol processing */
> -		rxe_run_task(&qp->req.task, 0);
> +		rxe_run_task(&qp->req.task, 1);
Hi,

 From your description, it seems that all the problems are caused by 
NVME kernel.

Am I correct?

It seems that this block handles process context. If we keep 
"rxe_run_task(&qp->req.task, 0);",

Do all the problems still occur?

Zhu Yanjun


>   		return 0;
>   	} else
>   		return rxe_post_send_kernel(qp, wr, bad_wr);

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] nvmet,rxe: defer ip datagram sending to tasklet
  2018-05-08  3:09 ` Yanjun Zhu
@ 2018-05-08  8:55   ` Alexandru Moise
  0 siblings, 0 replies; 3+ messages in thread
From: Alexandru Moise @ 2018-05-08  8:55 UTC (permalink / raw)
  To: Yanjun Zhu; +Cc: monis, dledford, jgg, linux-rdma, linux-kernel

On Tue, May 08, 2018 at 11:09:18AM +0800, Yanjun Zhu wrote:
> 
> 
> On 2018/5/8 4:58, Alexandru Moise wrote:
> > This addresses 3 separate problems:
> > 
> > 1. When using NVME over Fabrics we may end up sending IP
> > packets in interrupt context, we should defer this work
> > to a tasklet.
> > 
> > [   50.939957] WARNING: CPU: 3 PID: 0 at kernel/softirq.c:161 __local_bh_enable_ip+0x1f/0xa0
> > [   50.942602] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G        W         4.17.0-rc3-ARCH+ #104
> > [   50.945466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
> > [   50.948163] RIP: 0010:__local_bh_enable_ip+0x1f/0xa0
> > [   50.949631] RSP: 0018:ffff88009c183900 EFLAGS: 00010006
> > [   50.951029] RAX: 0000000080010403 RBX: 0000000000000200 RCX: 0000000000000001
> > [   50.952636] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffff817e04ec
> > [   50.954278] RBP: ffff88009c183910 R08: 0000000000000001 R09: 0000000000000614
> > [   50.956000] R10: ffffea00021d5500 R11: 0000000000000001 R12: ffffffff817e04ec
> > [   50.957779] R13: 0000000000000000 R14: ffff88009566f400 R15: ffff8800956c7000
> > [   50.959402] FS:  0000000000000000(0000) GS:ffff88009c180000(0000) knlGS:0000000000000000
> > [   50.961552] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   50.963798] CR2: 000055c4ec0ccac0 CR3: 0000000002209001 CR4: 00000000000606e0
> > [   50.966121] Call Trace:
> > [   50.966845]  <IRQ>
> > [   50.967497]  __dev_queue_xmit+0x62d/0x690
> > [   50.968722]  dev_queue_xmit+0x10/0x20
> > [   50.969894]  neigh_resolve_output+0x173/0x190
> > [   50.971244]  ip_finish_output2+0x2b8/0x370
> > [   50.972527]  ip_finish_output+0x1d2/0x220
> > [   50.973785]  ? ip_finish_output+0x1d2/0x220
> > [   50.975010]  ip_output+0xd4/0x100
> > [   50.975903]  ip_local_out+0x3b/0x50
> > [   50.976823]  rxe_send+0x74/0x120
> > [   50.977702]  rxe_requester+0xe3b/0x10b0
> > [   50.978881]  ? ip_local_deliver_finish+0xd1/0xe0
> > [   50.980260]  rxe_do_task+0x85/0x100
> > [   50.981386]  rxe_run_task+0x2f/0x40
> > [   50.982470]  rxe_post_send+0x51a/0x550
> > [   50.983591]  nvmet_rdma_queue_response+0x10a/0x170
> > [   50.985024]  __nvmet_req_complete+0x95/0xa0
> > [   50.986287]  nvmet_req_complete+0x15/0x60
> > [   50.987469]  nvmet_bio_done+0x2d/0x40
> > [   50.988564]  bio_endio+0x12c/0x140
> > [   50.989654]  blk_update_request+0x185/0x2a0
> > [   50.990947]  blk_mq_end_request+0x1e/0x80
> > [   50.991997]  nvme_complete_rq+0x1cc/0x1e0
> > [   50.993171]  nvme_pci_complete_rq+0x117/0x120
> > [   50.994355]  __blk_mq_complete_request+0x15e/0x180
> > [   50.995988]  blk_mq_complete_request+0x6f/0xa0
> > [   50.997304]  nvme_process_cq+0xe0/0x1b0
> > [   50.998494]  nvme_irq+0x28/0x50
> > [   50.999572]  __handle_irq_event_percpu+0xa2/0x1c0
> > [   51.000986]  handle_irq_event_percpu+0x32/0x80
> > [   51.002356]  handle_irq_event+0x3c/0x60
> > [   51.003463]  handle_edge_irq+0x1c9/0x200
> > [   51.004473]  handle_irq+0x23/0x30
> > [   51.005363]  do_IRQ+0x46/0xd0
> > [   51.006182]  common_interrupt+0xf/0xf
> > [   51.007129]  </IRQ>
> > 
> > 2. Work must always be offloaded to tasklet when using NVMEoF in order to
> > solve lock ordering between neigh->ha_lock seqlock and the nvme queue lock:
> > 
> > [   77.833783]  Possible interrupt unsafe locking scenario:
> > [   77.833783]
> > [   77.835831]        CPU0                    CPU1
> > [   77.837129]        ----                    ----
> > [   77.838313]   lock(&(&n->ha_lock)->seqcount);
> > [   77.839550]                                local_irq_disable();
> > [   77.841377]                                lock(&(&nvmeq->q_lock)->rlock);
> > [   77.843222]                                lock(&(&n->ha_lock)->seqcount);
> > [   77.845178]   <Interrupt>
> > [   77.846298]     lock(&(&nvmeq->q_lock)->rlock);
> > [   77.847986]
> > [   77.847986]  *** DEADLOCK ***
> > 
> > 3. Same goes for the lock ordering between sch->q.lock and nvme queue lock:
> > 
> > [   47.634271]  Possible interrupt unsafe locking scenario:
> > [   47.634271]
> > [   47.636452]        CPU0                    CPU1
> > [   47.637861]        ----                    ----
> > [   47.639285]   lock(&(&sch->q.lock)->rlock);
> > [   47.640654]                                local_irq_disable();
> > [   47.642451]                                lock(&(&nvmeq->q_lock)->rlock);
> > [   47.644521]                                lock(&(&sch->q.lock)->rlock);
> > [   47.646480]   <Interrupt>
> > [   47.647263]     lock(&(&nvmeq->q_lock)->rlock);
> > [   47.648492]
> > [   47.648492]  *** DEADLOCK ***
> > 
> > Using NVMEoF after this patch seems to finally be stable, without it,
> > rxe eventually deadlocks the whole system and causes RCU stalls.
> > 
> > Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
> > ---
> > v2->v3: Abandoned the idea of just offloading work to tasklet in irq
> > context. NVMEoF needs the work to always be offloaded in order to
> > function. I am unsure how large an impact this has on other rxe users.
> > 
> >   drivers/infiniband/sw/rxe/rxe_verbs.c | 12 ++----------
> >   1 file changed, 2 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
> > index 2cb52fd48cf1..564453060c54 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> > @@ -761,7 +761,6 @@ static int rxe_post_send_kernel(struct rxe_qp *qp, struct ib_send_wr *wr,
> >   	unsigned int mask;
> >   	unsigned int length = 0;
> >   	int i;
> > -	int must_sched;
> >   	while (wr) {
> >   		mask = wr_opcode_mask(wr->opcode, qp);
> > @@ -791,14 +790,7 @@ static int rxe_post_send_kernel(struct rxe_qp *qp, struct ib_send_wr *wr,
> >   		wr = wr->next;
> >   	}
> > -	/*
> > -	 * Must sched in case of GSI QP because ib_send_mad() hold irq lock,
> > -	 * and the requester call ip_local_out_sk() that takes spin_lock_bh.
> > -	 */
> > -	must_sched = (qp_type(qp) == IB_QPT_GSI) ||
> > -			(queue_count(qp->sq.queue) > 1);
> > -
> > -	rxe_run_task(&qp->req.task, must_sched);
> > +	rxe_run_task(&qp->req.task, 1);
> >   	if (unlikely(qp->req.state == QP_STATE_ERROR))
> >   		rxe_run_task(&qp->comp.task, 1);
> > @@ -822,7 +814,7 @@ static int rxe_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
> >   	if (qp->is_user) {
> >   		/* Utilize process context to do protocol processing */
> > -		rxe_run_task(&qp->req.task, 0);
> > +		rxe_run_task(&qp->req.task, 1);
> Hi,
> 
> From your description, it seems that all the problems are caused by NVME
> kernel.
> 
> Am I correct?
> 
> It seems that this block handles process context. If we keep
> "rxe_run_task(&qp->req.task, 0);",
> 
> Do all the problems still occur?
You are right, it seems that only rxe_post_send_kernel() needs to be patched.
If I keep "rxe_run_task(&qp->req.task, 0);" in rxe_post_send() I can't get rxe
to hang.

Will send a v4 with just rxe_post_send_kernel() patched.

Thanks,
../Alex
> 
> Zhu Yanjun
> 
> 
> >   		return 0;
> >   	} else
> >   		return rxe_post_send_kernel(qp, wr, bad_wr);
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-05-08  8:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-07 20:58 [PATCH v3] nvmet,rxe: defer ip datagram sending to tasklet Alexandru Moise
2018-05-08  3:09 ` Yanjun Zhu
2018-05-08  8:55   ` Alexandru Moise

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox