* [PATCH 08/10] Move responsibility of cleaning up pool elements
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma,
linux-kernel
Cc: Maksym Planeta
In-Reply-To: <20190722151426.5266-1-mplaneta@os.inf.tu-dresden.de>
Specific implementations must finish off the cleanup of pool elements
when it is the right moment. Reason for that is that a concreate cleanup
function may want postpone and schedule part of object destruction to
later time.
Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
drivers/infiniband/sw/rxe/rxe_cq.c | 2 ++
drivers/infiniband/sw/rxe/rxe_mcast.c | 2 ++
drivers/infiniband/sw/rxe/rxe_mr.c | 2 ++
drivers/infiniband/sw/rxe/rxe_pool.c | 9 ++++++++-
drivers/infiniband/sw/rxe/rxe_pool.h | 2 ++
5 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_cq.c b/drivers/infiniband/sw/rxe/rxe_cq.c
index ad3090131126..5693986c8c1b 100644
--- a/drivers/infiniband/sw/rxe/rxe_cq.c
+++ b/drivers/infiniband/sw/rxe/rxe_cq.c
@@ -182,4 +182,6 @@ void rxe_cq_cleanup(struct rxe_pool_entry *arg)
if (cq->queue)
rxe_queue_cleanup(cq->queue);
+
+ rxe_elem_cleanup(arg);
}
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 24ebc4ca1b99..6453ae97653f 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -187,4 +187,6 @@ void rxe_mc_cleanup(struct rxe_pool_entry *arg)
rxe_drop_key(&grp->pelem);
rxe_mcast_delete(rxe, &grp->mgid);
+
+ rxe_elem_cleanup(arg);
}
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 441b01e30afa..1c0940655df1 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -104,6 +104,8 @@ void rxe_mem_cleanup(struct rxe_pool_entry *arg)
kfree(mem->map);
}
+
+ rxe_elem_cleanup(arg);
}
static int rxe_mem_alloc(struct rxe_mem *mem, int num_buf)
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 711d7d7f3692..3b14ab599662 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -462,9 +462,16 @@ void rxe_elem_release(struct kref *kref)
if (pool->cleanup)
pool->cleanup(elem);
+ else
+ rxe_elem_cleanup(elem);
+}
+
+void rxe_elem_cleanup(struct rxe_pool_entry *pelem)
+{
+ struct rxe_pool *pool = pelem->pool;
if (!(pool->flags & RXE_POOL_NO_ALLOC))
- kmem_cache_free(pool_cache(pool), elem);
+ kmem_cache_free(pool_cache(pool), pelem);
atomic_dec(&pool->num_elem);
ib_device_put(&pool->rxe->ib_dev);
rxe_pool_put(pool);
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 1e3508c74dc0..84cfbc749a5c 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -160,6 +160,8 @@ void *rxe_pool_get_key(struct rxe_pool *pool, void *key);
/* cleanup an object when all references are dropped */
void rxe_elem_release(struct kref *kref);
+void rxe_elem_cleanup(struct rxe_pool_entry *pelem);
+
/* take a reference on an object */
static inline void rxe_add_ref(struct rxe_pool_entry *pelem) {
kref_get(&pelem->ref_cnt);
--
2.20.1
^ permalink raw reply related
* [PATCH 07/10] Pass the return value of kref_put further
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma,
linux-kernel
Cc: Maksym Planeta
In-Reply-To: <20190722151426.5266-1-mplaneta@os.inf.tu-dresden.de>
Used in a later patch.
Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
drivers/infiniband/sw/rxe/rxe_pool.c | 3 ++-
drivers/infiniband/sw/rxe/rxe_pool.h | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 30a887cf9200..711d7d7f3692 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -541,7 +541,7 @@ static void rxe_dummy_release(struct kref *kref)
{
}
-void rxe_drop_ref(struct rxe_pool_entry *pelem)
+int rxe_drop_ref(struct rxe_pool_entry *pelem)
{
int res;
struct rxe_pool *pool = pelem->pool;
@@ -553,4 +553,5 @@ void rxe_drop_ref(struct rxe_pool_entry *pelem)
if (res) {
rxe_elem_release(&pelem->ref_cnt);
}
+ return res;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index b90cc84c5511..1e3508c74dc0 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -166,6 +166,6 @@ static inline void rxe_add_ref(struct rxe_pool_entry *pelem) {
}
/* drop a reference on an object */
-void rxe_drop_ref(struct rxe_pool_entry *pelem);
+int rxe_drop_ref(struct rxe_pool_entry *pelem);
#endif /* RXE_POOL_H */
--
2.20.1
^ permalink raw reply related
* [PATCH 06/10] Remove pd form rxe_ah
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma,
linux-kernel
Cc: Maksym Planeta
In-Reply-To: <20190722151426.5266-1-mplaneta@os.inf.tu-dresden.de>
Pointer to PD is not used in rxe_ah anymore.
Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
drivers/infiniband/sw/rxe/rxe_verbs.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index 5c4b2239129c..8dd65c2a7c72 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -73,7 +73,6 @@ struct rxe_pd {
struct rxe_ah {
struct ib_ah ibah;
struct rxe_pool_entry pelem;
- struct rxe_pd *pd;
struct rxe_av av;
};
--
2.20.1
^ permalink raw reply related
* [PATCH 05/10] Fix reference counting for rxe tasklets
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma,
linux-kernel
Cc: Maksym Planeta
In-Reply-To: <20190722151426.5266-1-mplaneta@os.inf.tu-dresden.de>
Reference should be aquired *before* the tasklet is run. The best time
to increment the reference counter is at initialisation. Otherwise, the
object may not exists anymore by the time tasklet is run.
Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
drivers/infiniband/sw/rxe/rxe_comp.c | 4 ----
drivers/infiniband/sw/rxe/rxe_req.c | 4 ----
drivers/infiniband/sw/rxe/rxe_resp.c | 3 ---
drivers/infiniband/sw/rxe/rxe_task.c | 8 ++++++--
drivers/infiniband/sw/rxe/rxe_task.h | 2 +-
5 files changed, 7 insertions(+), 14 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index bdeb288673f3..3a1a33286f87 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -557,8 +557,6 @@ int rxe_completer(void *arg)
struct rxe_pkt_info *pkt = NULL;
enum comp_state state;
- rxe_add_ref(&qp->pelem);
-
if (!qp->valid || qp->req.state == QP_STATE_ERROR ||
qp->req.state == QP_STATE_RESET) {
rxe_drain_resp_pkts(qp, qp->valid &&
@@ -780,7 +778,6 @@ int rxe_completer(void *arg)
* exit from the loop calling us
*/
WARN_ON_ONCE(skb);
- rxe_drop_ref(&qp->pelem);
return -EAGAIN;
done:
@@ -788,6 +785,5 @@ int rxe_completer(void *arg)
* us again to see if there is anything else to do
*/
WARN_ON_ONCE(skb);
- rxe_drop_ref(&qp->pelem);
return 0;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 0d018adbe512..c63e66873757 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -594,8 +594,6 @@ int rxe_requester(void *arg)
struct rxe_send_wqe rollback_wqe;
u32 rollback_psn;
- rxe_add_ref(&qp->pelem);
-
next_wqe:
if (unlikely(!qp->valid || qp->req.state == QP_STATE_ERROR))
goto exit;
@@ -702,7 +700,6 @@ int rxe_requester(void *arg)
wqe->state = wqe_state_done;
wqe->status = IB_WC_SUCCESS;
__rxe_do_task(&qp->comp.task);
- rxe_drop_ref(&qp->pelem);
return 0;
}
payload = mtu;
@@ -753,6 +750,5 @@ int rxe_requester(void *arg)
__rxe_do_task(&qp->comp.task);
exit:
- rxe_drop_ref(&qp->pelem);
return -EAGAIN;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index f038bae1bd70..7be8a362d2ef 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -1212,8 +1212,6 @@ int rxe_responder(void *arg)
struct rxe_pkt_info *pkt = NULL;
int ret = 0;
- rxe_add_ref(&qp->pelem);
-
qp->resp.aeth_syndrome = AETH_ACK_UNLIMITED;
if (!qp->valid) {
@@ -1392,6 +1390,5 @@ int rxe_responder(void *arg)
exit:
ret = -EAGAIN;
done:
- rxe_drop_ref(&qp->pelem);
return ret;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 7c2b6e4595f5..9d6b8ad08a3a 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -35,6 +35,7 @@
#include <linux/interrupt.h>
#include <linux/hardirq.h>
+#include "rxe.h"
#include "rxe_task.h"
int __rxe_do_task(struct rxe_task *task)
@@ -115,14 +116,15 @@ void rxe_do_task(unsigned long data)
}
int rxe_init_task(void *obj, struct rxe_task *task,
- void *arg, int (*func)(void *), char *name)
+ struct rxe_qp *qp, int (*func)(void *), char *name)
{
task->obj = obj;
- task->arg = arg;
+ task->arg = qp;
task->func = func;
snprintf(task->name, sizeof(task->name), "%s", name);
task->destroyed = false;
+ rxe_add_ref(&qp->pelem);
tasklet_init(&task->tasklet, rxe_do_task, (unsigned long)task);
task->state = TASK_STATE_START;
@@ -135,6 +137,7 @@ void rxe_cleanup_task(struct rxe_task *task)
{
unsigned long flags;
bool idle;
+ struct rxe_qp *qp = (struct rxe_qp *)task->arg;
/*
* Mark the task, then wait for it to finish. It might be
@@ -149,6 +152,7 @@ void rxe_cleanup_task(struct rxe_task *task)
} while (!idle);
tasklet_kill(&task->tasklet);
+ rxe_drop_ref(&qp->pelem);
}
void rxe_run_task(struct rxe_task *task)
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index 5c1fc7d5b953..671b1774b577 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -63,7 +63,7 @@ struct rxe_task {
* fcn => function to call until it returns != 0
*/
int rxe_init_task(void *obj, struct rxe_task *task,
- void *arg, int (*func)(void *), char *name);
+ struct rxe_qp *qp, int (*func)(void *), char *name);
/* cleanup task */
void rxe_cleanup_task(struct rxe_task *task);
--
2.20.1
^ permalink raw reply related
* [PATCH 03/10] Make pool interface more type safe
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma,
linux-kernel
Cc: Maksym Planeta
In-Reply-To: <20190722151426.5266-1-mplaneta@os.inf.tu-dresden.de>
Replace void* with rxe_pool_entry* for some functions.
Change macro to inline function.
Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
drivers/infiniband/sw/rxe/rxe_comp.c | 18 ++++----
drivers/infiniband/sw/rxe/rxe_mcast.c | 20 ++++----
drivers/infiniband/sw/rxe/rxe_mr.c | 8 ++--
drivers/infiniband/sw/rxe/rxe_net.c | 6 +--
drivers/infiniband/sw/rxe/rxe_pool.c | 12 ++---
drivers/infiniband/sw/rxe/rxe_pool.h | 16 ++++---
drivers/infiniband/sw/rxe/rxe_qp.c | 32 ++++++-------
drivers/infiniband/sw/rxe/rxe_recv.c | 8 ++--
drivers/infiniband/sw/rxe/rxe_req.c | 8 ++--
drivers/infiniband/sw/rxe/rxe_resp.c | 22 ++++-----
drivers/infiniband/sw/rxe/rxe_verbs.c | 66 +++++++++++++--------------
11 files changed, 108 insertions(+), 108 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 5f3a43cfeb63..bdeb288673f3 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -534,7 +534,7 @@ static void rxe_drain_resp_pkts(struct rxe_qp *qp, bool notify)
struct rxe_send_wqe *wqe;
while ((skb = skb_dequeue(&qp->resp_pkts))) {
- rxe_drop_ref(qp);
+ rxe_drop_ref(&qp->pelem);
kfree_skb(skb);
}
@@ -557,7 +557,7 @@ int rxe_completer(void *arg)
struct rxe_pkt_info *pkt = NULL;
enum comp_state state;
- rxe_add_ref(qp);
+ rxe_add_ref(&qp->pelem);
if (!qp->valid || qp->req.state == QP_STATE_ERROR ||
qp->req.state == QP_STATE_RESET) {
@@ -646,7 +646,7 @@ int rxe_completer(void *arg)
case COMPST_DONE:
if (pkt) {
- rxe_drop_ref(pkt->qp);
+ rxe_drop_ref(&pkt->qp->pelem);
kfree_skb(skb);
skb = NULL;
}
@@ -694,7 +694,7 @@ int rxe_completer(void *arg)
if (qp->comp.started_retry &&
!qp->comp.timeout_retry) {
if (pkt) {
- rxe_drop_ref(pkt->qp);
+ rxe_drop_ref(&pkt->qp->pelem);
kfree_skb(skb);
skb = NULL;
}
@@ -723,7 +723,7 @@ int rxe_completer(void *arg)
}
if (pkt) {
- rxe_drop_ref(pkt->qp);
+ rxe_drop_ref(&pkt->qp->pelem);
kfree_skb(skb);
skb = NULL;
}
@@ -748,7 +748,7 @@ int rxe_completer(void *arg)
mod_timer(&qp->rnr_nak_timer,
jiffies + rnrnak_jiffies(aeth_syn(pkt)
& ~AETH_TYPE_MASK));
- rxe_drop_ref(pkt->qp);
+ rxe_drop_ref(&pkt->qp->pelem);
kfree_skb(skb);
skb = NULL;
goto exit;
@@ -766,7 +766,7 @@ int rxe_completer(void *arg)
rxe_qp_error(qp);
if (pkt) {
- rxe_drop_ref(pkt->qp);
+ rxe_drop_ref(&pkt->qp->pelem);
kfree_skb(skb);
skb = NULL;
}
@@ -780,7 +780,7 @@ int rxe_completer(void *arg)
* exit from the loop calling us
*/
WARN_ON_ONCE(skb);
- rxe_drop_ref(qp);
+ rxe_drop_ref(&qp->pelem);
return -EAGAIN;
done:
@@ -788,6 +788,6 @@ int rxe_completer(void *arg)
* us again to see if there is anything else to do
*/
WARN_ON_ONCE(skb);
- rxe_drop_ref(qp);
+ rxe_drop_ref(&qp->pelem);
return 0;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 522a7942c56c..24ebc4ca1b99 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -59,7 +59,7 @@ int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
spin_lock_init(&grp->mcg_lock);
grp->rxe = rxe;
- rxe_add_key(grp, mgid);
+ rxe_add_key(&grp->pelem, mgid);
err = rxe_mcast_add(rxe, mgid);
if (err)
@@ -70,7 +70,7 @@ int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
return 0;
err2:
- rxe_drop_ref(grp);
+ rxe_drop_ref(&grp->pelem);
err1:
return err;
}
@@ -103,7 +103,7 @@ int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
}
/* each qp holds a ref on the grp */
- rxe_add_ref(grp);
+ rxe_add_ref(&grp->pelem);
grp->num_qp++;
elem->qp = qp;
@@ -140,16 +140,16 @@ int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
spin_unlock_bh(&grp->mcg_lock);
spin_unlock_bh(&qp->grp_lock);
- rxe_drop_ref(elem);
- rxe_drop_ref(grp); /* ref held by QP */
- rxe_drop_ref(grp); /* ref from get_key */
+ rxe_drop_ref(&elem->pelem);
+ rxe_drop_ref(&grp->pelem); /* ref held by QP */
+ rxe_drop_ref(&grp->pelem); /* ref from get_key */
return 0;
}
}
spin_unlock_bh(&grp->mcg_lock);
spin_unlock_bh(&qp->grp_lock);
- rxe_drop_ref(grp); /* ref from get_key */
+ rxe_drop_ref(&grp->pelem); /* ref from get_key */
err1:
return -EINVAL;
}
@@ -175,8 +175,8 @@ void rxe_drop_all_mcast_groups(struct rxe_qp *qp)
list_del(&elem->qp_list);
grp->num_qp--;
spin_unlock_bh(&grp->mcg_lock);
- rxe_drop_ref(grp);
- rxe_drop_ref(elem);
+ rxe_drop_ref(&grp->pelem);
+ rxe_drop_ref(&elem->pelem);
}
}
@@ -185,6 +185,6 @@ void rxe_mc_cleanup(struct rxe_pool_entry *arg)
struct rxe_mc_grp *grp = container_of(arg, typeof(*grp), pelem);
struct rxe_dev *rxe = grp->rxe;
- rxe_drop_key(grp);
+ rxe_drop_key(&grp->pelem);
rxe_mcast_delete(rxe, &grp->mgid);
}
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index ea6a819b7167..441b01e30afa 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -470,7 +470,7 @@ int copy_data(
if (offset >= sge->length) {
if (mem) {
- rxe_drop_ref(mem);
+ rxe_drop_ref(&mem->pelem);
mem = NULL;
}
sge++;
@@ -515,13 +515,13 @@ int copy_data(
dma->resid = resid;
if (mem)
- rxe_drop_ref(mem);
+ rxe_drop_ref(&mem->pelem);
return 0;
err2:
if (mem)
- rxe_drop_ref(mem);
+ rxe_drop_ref(&mem->pelem);
err1:
return err;
}
@@ -581,7 +581,7 @@ struct rxe_mem *lookup_mem(struct rxe_pd *pd, int access, u32 key,
mem->pd != pd ||
(access && !(access & mem->access)) ||
mem->state != RXE_MEM_STATE_VALID)) {
- rxe_drop_ref(mem);
+ rxe_drop_ref(&mem->pelem);
mem = NULL;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index e50f19fadcf9..f8c3604bc5ad 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -416,7 +416,7 @@ static void rxe_skb_tx_dtor(struct sk_buff *skb)
skb_out < RXE_INFLIGHT_SKBS_PER_QP_LOW))
rxe_run_task(&qp->req.task);
- rxe_drop_ref(qp);
+ rxe_drop_ref(&qp->pelem);
}
int rxe_send(struct rxe_pkt_info *pkt, struct sk_buff *skb)
@@ -426,7 +426,7 @@ int rxe_send(struct rxe_pkt_info *pkt, struct sk_buff *skb)
skb->destructor = rxe_skb_tx_dtor;
skb->sk = pkt->qp->sk->sk;
- rxe_add_ref(pkt->qp);
+ rxe_add_ref(&pkt->qp->pelem);
atomic_inc(&pkt->qp->skb_out);
if (skb->protocol == htons(ETH_P_IP)) {
@@ -436,7 +436,7 @@ int rxe_send(struct rxe_pkt_info *pkt, struct sk_buff *skb)
} else {
pr_err("Unknown layer 3 protocol: %d\n", skb->protocol);
atomic_dec(&pkt->qp->skb_out);
- rxe_drop_ref(pkt->qp);
+ rxe_drop_ref(&pkt->qp->pelem);
kfree_skb(skb);
return -EINVAL;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index fbcbac52290b..efa9bab01e02 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -338,9 +338,8 @@ static void insert_key(struct rxe_pool *pool, struct rxe_pool_entry *new)
return;
}
-void rxe_add_key(void *arg, void *key)
+void rxe_add_key(struct rxe_pool_entry *elem, void *key)
{
- struct rxe_pool_entry *elem = arg;
struct rxe_pool *pool = elem->pool;
unsigned long flags;
@@ -350,9 +349,8 @@ void rxe_add_key(void *arg, void *key)
write_unlock_irqrestore(&pool->pool_lock, flags);
}
-void rxe_drop_key(void *arg)
+void rxe_drop_key(struct rxe_pool_entry *elem)
{
- struct rxe_pool_entry *elem = arg;
struct rxe_pool *pool = elem->pool;
unsigned long flags;
@@ -361,9 +359,8 @@ void rxe_drop_key(void *arg)
write_unlock_irqrestore(&pool->pool_lock, flags);
}
-void rxe_add_index(void *arg)
+void rxe_add_index(struct rxe_pool_entry *elem)
{
- struct rxe_pool_entry *elem = arg;
struct rxe_pool *pool = elem->pool;
unsigned long flags;
@@ -373,9 +370,8 @@ void rxe_add_index(void *arg)
write_unlock_irqrestore(&pool->pool_lock, flags);
}
-void rxe_drop_index(void *arg)
+void rxe_drop_index(struct rxe_pool_entry *elem)
{
- struct rxe_pool_entry *elem = arg;
struct rxe_pool *pool = elem->pool;
unsigned long flags;
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 2f2cff1cbe43..5c6a9429f541 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -138,18 +138,18 @@ int rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_entry *elem);
/* assign an index to an indexed object and insert object into
* pool's rb tree
*/
-void rxe_add_index(void *elem);
+void rxe_add_index(struct rxe_pool_entry *elem);
/* drop an index and remove object from rb tree */
-void rxe_drop_index(void *elem);
+void rxe_drop_index(struct rxe_pool_entry *elem);
/* assign a key to a keyed object and insert object into
* pool's rb tree
*/
-void rxe_add_key(void *elem, void *key);
+void rxe_add_key(struct rxe_pool_entry *elem, void *key);
/* remove elem from rb tree */
-void rxe_drop_key(void *elem);
+void rxe_drop_key(struct rxe_pool_entry *elem);
/* lookup an indexed object from index. takes a reference on object */
void *rxe_pool_get_index(struct rxe_pool *pool, u32 index);
@@ -161,9 +161,13 @@ void *rxe_pool_get_key(struct rxe_pool *pool, void *key);
void rxe_elem_release(struct kref *kref);
/* take a reference on an object */
-#define rxe_add_ref(elem) kref_get(&(elem)->pelem.ref_cnt)
+static inline void rxe_add_ref(struct rxe_pool_entry *pelem) {
+ kref_get(&pelem->ref_cnt);
+}
/* drop a reference on an object */
-#define rxe_drop_ref(elem) kref_put(&(elem)->pelem.ref_cnt, rxe_elem_release)
+static inline void rxe_drop_ref(struct rxe_pool_entry *pelem) {
+ kref_put(&pelem->ref_cnt, rxe_elem_release);
+}
#endif /* RXE_POOL_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 623f44f1d1d5..7cd929185581 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -152,11 +152,11 @@ static void free_rd_atomic_resources(struct rxe_qp *qp)
void free_rd_atomic_resource(struct rxe_qp *qp, struct resp_res *res)
{
if (res->type == RXE_ATOMIC_MASK) {
- rxe_drop_ref(qp);
+ rxe_drop_ref(&qp->pelem);
kfree_skb(res->atomic.skb);
} else if (res->type == RXE_READ_MASK) {
if (res->read.mr)
- rxe_drop_ref(res->read.mr);
+ rxe_drop_ref(&res->read.mr->pelem);
}
res->type = 0;
}
@@ -344,11 +344,11 @@ int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd,
struct rxe_cq *scq = to_rcq(init->send_cq);
struct rxe_srq *srq = init->srq ? to_rsrq(init->srq) : NULL;
- rxe_add_ref(pd);
- rxe_add_ref(rcq);
- rxe_add_ref(scq);
+ rxe_add_ref(&pd->pelem);
+ rxe_add_ref(&rcq->pelem);
+ rxe_add_ref(&scq->pelem);
if (srq)
- rxe_add_ref(srq);
+ rxe_add_ref(&srq->pelem);
qp->pd = pd;
qp->rcq = rcq;
@@ -374,10 +374,10 @@ int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd,
rxe_queue_cleanup(qp->sq.queue);
err1:
if (srq)
- rxe_drop_ref(srq);
- rxe_drop_ref(scq);
- rxe_drop_ref(rcq);
- rxe_drop_ref(pd);
+ rxe_drop_ref(&srq->pelem);
+ rxe_drop_ref(&scq->pelem);
+ rxe_drop_ref(&rcq->pelem);
+ rxe_drop_ref(&pd->pelem);
return err;
}
@@ -536,7 +536,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
qp->resp.sent_psn_nak = 0;
if (qp->resp.mr) {
- rxe_drop_ref(qp->resp.mr);
+ rxe_drop_ref(&qp->resp.mr->pelem);
qp->resp.mr = NULL;
}
@@ -812,20 +812,20 @@ static void rxe_qp_do_cleanup(struct work_struct *work)
rxe_queue_cleanup(qp->sq.queue);
if (qp->srq)
- rxe_drop_ref(qp->srq);
+ rxe_drop_ref(&qp->srq->pelem);
if (qp->rq.queue)
rxe_queue_cleanup(qp->rq.queue);
if (qp->scq)
- rxe_drop_ref(qp->scq);
+ rxe_drop_ref(&qp->scq->pelem);
if (qp->rcq)
- rxe_drop_ref(qp->rcq);
+ rxe_drop_ref(&qp->rcq->pelem);
if (qp->pd)
- rxe_drop_ref(qp->pd);
+ rxe_drop_ref(&qp->pd->pelem);
if (qp->resp.mr) {
- rxe_drop_ref(qp->resp.mr);
+ rxe_drop_ref(&qp->resp.mr->pelem);
qp->resp.mr = NULL;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index f9a492ed900b..bd8dc133a722 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -261,7 +261,7 @@ static int hdr_check(struct rxe_pkt_info *pkt)
return 0;
err2:
- rxe_drop_ref(qp);
+ rxe_drop_ref(&qp->pelem);
err1:
return -EINVAL;
}
@@ -316,13 +316,13 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb)
skb_get(skb);
pkt->qp = qp;
- rxe_add_ref(qp);
+ rxe_add_ref(&qp->pelem);
rxe_rcv_pkt(pkt, skb);
}
spin_unlock_bh(&mcg->mcg_lock);
- rxe_drop_ref(mcg); /* drop ref from rxe_pool_get_key. */
+ rxe_drop_ref(&mcg->pelem); /* drop ref from rxe_pool_get_key. */
err1:
kfree_skb(skb);
@@ -415,7 +415,7 @@ void rxe_rcv(struct sk_buff *skb)
drop:
if (pkt->qp)
- rxe_drop_ref(pkt->qp);
+ rxe_drop_ref(&pkt->qp->pelem);
kfree_skb(skb);
}
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index a84c0407545b..0d018adbe512 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -594,7 +594,7 @@ int rxe_requester(void *arg)
struct rxe_send_wqe rollback_wqe;
u32 rollback_psn;
- rxe_add_ref(qp);
+ rxe_add_ref(&qp->pelem);
next_wqe:
if (unlikely(!qp->valid || qp->req.state == QP_STATE_ERROR))
@@ -633,7 +633,7 @@ int rxe_requester(void *arg)
goto exit;
}
rmr->state = RXE_MEM_STATE_FREE;
- rxe_drop_ref(rmr);
+ rxe_drop_ref(&rmr->pelem);
wqe->state = wqe_state_done;
wqe->status = IB_WC_SUCCESS;
} else if (wqe->wr.opcode == IB_WR_REG_MR) {
@@ -702,7 +702,7 @@ int rxe_requester(void *arg)
wqe->state = wqe_state_done;
wqe->status = IB_WC_SUCCESS;
__rxe_do_task(&qp->comp.task);
- rxe_drop_ref(qp);
+ rxe_drop_ref(&qp->pelem);
return 0;
}
payload = mtu;
@@ -753,6 +753,6 @@ int rxe_requester(void *arg)
__rxe_do_task(&qp->comp.task);
exit:
- rxe_drop_ref(qp);
+ rxe_drop_ref(&qp->pelem);
return -EAGAIN;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index d4b5535b8517..f038bae1bd70 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -118,7 +118,7 @@ static inline enum resp_states get_req(struct rxe_qp *qp,
if (qp->resp.state == QP_STATE_ERROR) {
while ((skb = skb_dequeue(&qp->req_pkts))) {
- rxe_drop_ref(qp);
+ rxe_drop_ref(&qp->pelem);
kfree_skb(skb);
}
@@ -494,7 +494,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
err:
if (mem)
- rxe_drop_ref(mem);
+ rxe_drop_ref(&mem->pelem);
return state;
}
@@ -910,7 +910,7 @@ static enum resp_states do_complete(struct rxe_qp *qp,
return RESPST_ERROR;
}
rmr->state = RXE_MEM_STATE_FREE;
- rxe_drop_ref(rmr);
+ rxe_drop_ref(&rmr->pelem);
}
wc->qp = &qp->ibqp;
@@ -980,7 +980,7 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
goto out;
}
- rxe_add_ref(qp);
+ rxe_add_ref(&qp->pelem);
res = &qp->resp.resources[qp->resp.res_head];
free_rd_atomic_resource(qp, res);
@@ -1000,7 +1000,7 @@ static int send_atomic_ack(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
rc = rxe_xmit_packet(qp, &ack_pkt, skb);
if (rc) {
pr_err_ratelimited("Failed sending ack\n");
- rxe_drop_ref(qp);
+ rxe_drop_ref(&qp->pelem);
}
out:
return rc;
@@ -1029,12 +1029,12 @@ static enum resp_states cleanup(struct rxe_qp *qp,
if (pkt) {
skb = skb_dequeue(&qp->req_pkts);
- rxe_drop_ref(qp);
+ rxe_drop_ref(&qp->pelem);
kfree_skb(skb);
}
if (qp->resp.mr) {
- rxe_drop_ref(qp->resp.mr);
+ rxe_drop_ref(&qp->resp.mr->pelem);
qp->resp.mr = NULL;
}
@@ -1180,7 +1180,7 @@ static enum resp_states do_class_d1e_error(struct rxe_qp *qp)
}
if (qp->resp.mr) {
- rxe_drop_ref(qp->resp.mr);
+ rxe_drop_ref(&qp->resp.mr->pelem);
qp->resp.mr = NULL;
}
@@ -1193,7 +1193,7 @@ static void rxe_drain_req_pkts(struct rxe_qp *qp, bool notify)
struct sk_buff *skb;
while ((skb = skb_dequeue(&qp->req_pkts))) {
- rxe_drop_ref(qp);
+ rxe_drop_ref(&qp->pelem);
kfree_skb(skb);
}
@@ -1212,7 +1212,7 @@ int rxe_responder(void *arg)
struct rxe_pkt_info *pkt = NULL;
int ret = 0;
- rxe_add_ref(qp);
+ rxe_add_ref(&qp->pelem);
qp->resp.aeth_syndrome = AETH_ACK_UNLIMITED;
@@ -1392,6 +1392,6 @@ int rxe_responder(void *arg)
exit:
ret = -EAGAIN;
done:
- rxe_drop_ref(qp);
+ rxe_drop_ref(&qp->pelem);
return ret;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index f6b25b409d12..545eff108070 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -154,7 +154,7 @@ static void rxe_dealloc_ucontext(struct ib_ucontext *ibuc)
{
struct rxe_ucontext *uc = to_ruc(ibuc);
- rxe_drop_ref(uc);
+ rxe_drop_ref(&uc->pelem);
}
static int rxe_port_immutable(struct ib_device *dev, u8 port_num,
@@ -188,7 +188,7 @@ static void rxe_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
{
struct rxe_pd *pd = to_rpd(ibpd);
- rxe_drop_ref(pd);
+ rxe_drop_ref(&pd->pelem);
}
static int rxe_create_ah(struct ib_ah *ibah, struct rdma_ah_attr *attr,
@@ -239,7 +239,7 @@ static void rxe_destroy_ah(struct ib_ah *ibah, u32 flags)
{
struct rxe_ah *ah = to_rah(ibah);
- rxe_drop_ref(ah);
+ rxe_drop_ref(&ah->pelem);
}
static int post_one_recv(struct rxe_rq *rq, const struct ib_recv_wr *ibwr)
@@ -312,7 +312,7 @@ static int rxe_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init,
if (err)
goto err1;
- rxe_add_ref(pd);
+ rxe_add_ref(&pd->pelem);
srq->pd = pd;
err = rxe_srq_from_init(rxe, srq, init, udata, uresp);
@@ -322,8 +322,8 @@ static int rxe_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init,
return 0;
err2:
- rxe_drop_ref(pd);
- rxe_drop_ref(srq);
+ rxe_drop_ref(&pd->pelem);
+ rxe_drop_ref(&srq->pelem);
err1:
return err;
}
@@ -380,8 +380,8 @@ static void rxe_destroy_srq(struct ib_srq *ibsrq, struct ib_udata *udata)
if (srq->rq.queue)
rxe_queue_cleanup(srq->rq.queue);
- rxe_drop_ref(srq->pd);
- rxe_drop_ref(srq);
+ rxe_drop_ref(&srq->pd->pelem);
+ rxe_drop_ref(&srq->pelem);
}
static int rxe_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr,
@@ -442,7 +442,7 @@ static struct ib_qp *rxe_create_qp(struct ib_pd *ibpd,
qp->is_user = 1;
}
- rxe_add_index(qp);
+ rxe_add_index(&qp->pelem);
err = rxe_qp_from_init(rxe, qp, pd, init, uresp, ibpd, udata);
if (err)
@@ -451,9 +451,9 @@ static struct ib_qp *rxe_create_qp(struct ib_pd *ibpd,
return &qp->ibqp;
err3:
- rxe_drop_index(qp);
+ rxe_drop_index(&qp->pelem);
err2:
- rxe_drop_ref(qp);
+ rxe_drop_ref(&qp->pelem);
err1:
return ERR_PTR(err);
}
@@ -495,8 +495,8 @@ static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
struct rxe_qp *qp = to_rqp(ibqp);
rxe_qp_destroy(qp);
- rxe_drop_index(qp);
- rxe_drop_ref(qp);
+ rxe_drop_index(&qp->pelem);
+ rxe_drop_ref(&qp->pelem);
return 0;
}
@@ -814,7 +814,7 @@ static void rxe_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
rxe_cq_disable(cq);
- rxe_drop_ref(cq);
+ rxe_drop_ref(&cq->pelem);
}
static int rxe_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
@@ -904,9 +904,9 @@ static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access)
goto err1;
}
- rxe_add_index(mr);
+ rxe_add_index(&mr->pelem);
- rxe_add_ref(pd);
+ rxe_add_ref(&pd->pelem);
err = rxe_mem_init_dma(pd, access, mr);
if (err)
@@ -915,9 +915,9 @@ static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access)
return &mr->ibmr;
err2:
- rxe_drop_ref(pd);
- rxe_drop_index(mr);
- rxe_drop_ref(mr);
+ rxe_drop_ref(&pd->pelem);
+ rxe_drop_index(&mr->pelem);
+ rxe_drop_ref(&mr->pelem);
err1:
return ERR_PTR(err);
}
@@ -939,9 +939,9 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
goto err2;
}
- rxe_add_index(mr);
+ rxe_add_index(&mr->pelem);
- rxe_add_ref(pd);
+ rxe_add_ref(&pd->pelem);
err = rxe_mem_init_user(pd, start, length, iova,
access, udata, mr);
@@ -951,9 +951,9 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
return &mr->ibmr;
err3:
- rxe_drop_ref(pd);
- rxe_drop_index(mr);
- rxe_drop_ref(mr);
+ rxe_drop_ref(&pd->pelem);
+ rxe_drop_index(&mr->pelem);
+ rxe_drop_ref(&mr->pelem);
err2:
return ERR_PTR(err);
}
@@ -963,9 +963,9 @@ static int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
struct rxe_mem *mr = to_rmr(ibmr);
mr->state = RXE_MEM_STATE_ZOMBIE;
- rxe_drop_ref(mr->pd);
- rxe_drop_index(mr);
- rxe_drop_ref(mr);
+ rxe_drop_ref(&mr->pd->pelem);
+ rxe_drop_index(&mr->pelem);
+ rxe_drop_ref(&mr->pelem);
return 0;
}
@@ -986,9 +986,9 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
goto err1;
}
- rxe_add_index(mr);
+ rxe_add_index(&mr->pelem);
- rxe_add_ref(pd);
+ rxe_add_ref(&pd->pelem);
err = rxe_mem_init_fast(pd, max_num_sg, mr);
if (err)
@@ -997,9 +997,9 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
return &mr->ibmr;
err2:
- rxe_drop_ref(pd);
- rxe_drop_index(mr);
- rxe_drop_ref(mr);
+ rxe_drop_ref(&pd->pelem);
+ rxe_drop_index(&mr->pelem);
+ rxe_drop_ref(&mr->pelem);
err1:
return ERR_PTR(err);
}
@@ -1057,7 +1057,7 @@ static int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid)
err = rxe_mcast_add_grp_elem(rxe, qp, grp);
- rxe_drop_ref(grp);
+ rxe_drop_ref(&grp->pelem);
return err;
}
--
2.20.1
^ permalink raw reply related
* [PATCH 04/10] Protect kref_put with the lock
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma,
linux-kernel
Cc: Maksym Planeta
In-Reply-To: <20190722151426.5266-1-mplaneta@os.inf.tu-dresden.de>
Need to ensure that kref_put does not run concurrently with the loop
inside rxe_pool_get_key.
Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
drivers/infiniband/sw/rxe/rxe_pool.c | 18 ++++++++++++++++++
drivers/infiniband/sw/rxe/rxe_pool.h | 4 +---
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index efa9bab01e02..30a887cf9200 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -536,3 +536,21 @@ void *rxe_pool_get_key(struct rxe_pool *pool, void *key)
read_unlock_irqrestore(&pool->pool_lock, flags);
return node ? elem : NULL;
}
+
+static void rxe_dummy_release(struct kref *kref)
+{
+}
+
+void rxe_drop_ref(struct rxe_pool_entry *pelem)
+{
+ int res;
+ struct rxe_pool *pool = pelem->pool;
+ unsigned long flags;
+
+ write_lock_irqsave(&pool->pool_lock, flags);
+ res = kref_put(&pelem->ref_cnt, rxe_dummy_release);
+ write_unlock_irqrestore(&pool->pool_lock, flags);
+ if (res) {
+ rxe_elem_release(&pelem->ref_cnt);
+ }
+}
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 5c6a9429f541..b90cc84c5511 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -166,8 +166,6 @@ static inline void rxe_add_ref(struct rxe_pool_entry *pelem) {
}
/* drop a reference on an object */
-static inline void rxe_drop_ref(struct rxe_pool_entry *pelem) {
- kref_put(&pelem->ref_cnt, rxe_elem_release);
-}
+void rxe_drop_ref(struct rxe_pool_entry *pelem);
#endif /* RXE_POOL_H */
--
2.20.1
^ permalink raw reply related
* [PATCH 02/10] Remove counter that does not have a meaning anymore
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma,
linux-kernel
Cc: Maksym Planeta
In-Reply-To: <20190722151426.5266-1-mplaneta@os.inf.tu-dresden.de>
After putting all tasks to schedule, this counter does not have a
meaning anymore.
Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
drivers/infiniband/sw/rxe/rxe_comp.c | 6 ------
drivers/infiniband/sw/rxe/rxe_hw_counters.c | 1 -
drivers/infiniband/sw/rxe/rxe_hw_counters.h | 1 -
3 files changed, 8 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index ad09bd9d0a82..5f3a43cfeb63 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -148,14 +148,8 @@ void retransmit_timer(struct timer_list *t)
void rxe_comp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb)
{
- int must_sched;
-
skb_queue_tail(&qp->resp_pkts, skb);
- must_sched = skb_queue_len(&qp->resp_pkts) > 1;
- if (must_sched != 0)
- rxe_counter_inc(SKB_TO_PKT(skb)->rxe, RXE_CNT_COMPLETER_SCHED);
-
rxe_run_task(&qp->comp.task);
}
diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
index 636edb5f4cf4..9bc762b0e66a 100644
--- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c
+++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
@@ -41,7 +41,6 @@ static const char * const rxe_counter_name[] = {
[RXE_CNT_RCV_RNR] = "rcvd_rnr_err",
[RXE_CNT_SND_RNR] = "send_rnr_err",
[RXE_CNT_RCV_SEQ_ERR] = "rcvd_seq_err",
- [RXE_CNT_COMPLETER_SCHED] = "ack_deferred",
[RXE_CNT_RETRY_EXCEEDED] = "retry_exceeded_err",
[RXE_CNT_RNR_RETRY_EXCEEDED] = "retry_rnr_exceeded_err",
[RXE_CNT_COMP_RETRY] = "completer_retry_err",
diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
index 72c0d63c79e0..cfe19159a77f 100644
--- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h
+++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
@@ -45,7 +45,6 @@ enum rxe_counters {
RXE_CNT_RCV_RNR,
RXE_CNT_SND_RNR,
RXE_CNT_RCV_SEQ_ERR,
- RXE_CNT_COMPLETER_SCHED,
RXE_CNT_RETRY_EXCEEDED,
RXE_CNT_RNR_RETRY_EXCEEDED,
RXE_CNT_COMP_RETRY,
--
2.20.1
^ permalink raw reply related
* [PATCH 01/10] Simplify rxe_run_task interface
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma,
linux-kernel
Cc: Maksym Planeta
In-Reply-To: <20190722151426.5266-1-mplaneta@os.inf.tu-dresden.de>
Make rxe_run_task only schedule tasks and never run them directly.
This simplification will be used in further patches.
Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de>
---
drivers/infiniband/sw/rxe/rxe_comp.c | 16 ++++++++--------
drivers/infiniband/sw/rxe/rxe_loc.h | 2 +-
drivers/infiniband/sw/rxe/rxe_net.c | 2 +-
drivers/infiniband/sw/rxe/rxe_qp.c | 10 +++++-----
drivers/infiniband/sw/rxe/rxe_req.c | 6 +++---
drivers/infiniband/sw/rxe/rxe_resp.c | 8 +-------
drivers/infiniband/sw/rxe/rxe_task.c | 7 ++-----
drivers/infiniband/sw/rxe/rxe_task.h | 6 +++---
drivers/infiniband/sw/rxe/rxe_verbs.c | 8 ++++----
9 files changed, 28 insertions(+), 37 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 116cafc9afcf..ad09bd9d0a82 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -142,7 +142,7 @@ void retransmit_timer(struct timer_list *t)
if (qp->valid) {
qp->comp.timeout = 1;
- rxe_run_task(&qp->comp.task, 1);
+ rxe_run_task(&qp->comp.task);
}
}
@@ -156,7 +156,7 @@ void rxe_comp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb)
if (must_sched != 0)
rxe_counter_inc(SKB_TO_PKT(skb)->rxe, RXE_CNT_COMPLETER_SCHED);
- rxe_run_task(&qp->comp.task, must_sched);
+ rxe_run_task(&qp->comp.task);
}
static inline enum comp_state get_wqe(struct rxe_qp *qp,
@@ -329,7 +329,7 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
qp->comp.psn = pkt->psn;
if (qp->req.wait_psn) {
qp->req.wait_psn = 0;
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task);
}
}
return COMPST_ERROR_RETRY;
@@ -463,7 +463,7 @@ static void do_complete(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
*/
if (qp->req.wait_fence) {
qp->req.wait_fence = 0;
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task);
}
}
@@ -479,7 +479,7 @@ static inline enum comp_state complete_ack(struct rxe_qp *qp,
if (qp->req.need_rd_atomic) {
qp->comp.timeout_retry = 0;
qp->req.need_rd_atomic = 0;
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task);
}
}
@@ -525,7 +525,7 @@ static inline enum comp_state complete_wqe(struct rxe_qp *qp,
if (qp->req.wait_psn) {
qp->req.wait_psn = 0;
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task);
}
}
@@ -644,7 +644,7 @@ int rxe_completer(void *arg)
if (qp->req.wait_psn) {
qp->req.wait_psn = 0;
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task);
}
state = COMPST_DONE;
@@ -725,7 +725,7 @@ int rxe_completer(void *arg)
RXE_CNT_COMP_RETRY);
qp->req.need_retry = 1;
qp->comp.started_retry = 1;
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task);
}
if (pkt) {
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 775c23becaec..b7159d9d5107 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -277,7 +277,7 @@ static inline int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
if ((qp_type(qp) != IB_QPT_RC) &&
(pkt->mask & RXE_END_MASK)) {
pkt->wqe->state = wqe_state_done;
- rxe_run_task(&qp->comp.task, 1);
+ rxe_run_task(&qp->comp.task);
}
rxe_counter_inc(rxe, RXE_CNT_SENT_PKTS);
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 5a3474f9351b..e50f19fadcf9 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -414,7 +414,7 @@ static void rxe_skb_tx_dtor(struct sk_buff *skb)
if (unlikely(qp->need_req_skb &&
skb_out < RXE_INFLIGHT_SKBS_PER_QP_LOW))
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task);
rxe_drop_ref(qp);
}
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index e2c6d1cedf41..623f44f1d1d5 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -560,10 +560,10 @@ static void rxe_qp_drain(struct rxe_qp *qp)
if (qp->req.state != QP_STATE_DRAINED) {
qp->req.state = QP_STATE_DRAIN;
if (qp_type(qp) == IB_QPT_RC)
- rxe_run_task(&qp->comp.task, 1);
+ rxe_run_task(&qp->comp.task);
else
__rxe_do_task(&qp->comp.task);
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task);
}
}
}
@@ -576,13 +576,13 @@ void rxe_qp_error(struct rxe_qp *qp)
qp->attr.qp_state = IB_QPS_ERR;
/* drain work and packet queues */
- rxe_run_task(&qp->resp.task, 1);
+ rxe_run_task(&qp->resp.task);
if (qp_type(qp) == IB_QPT_RC)
- rxe_run_task(&qp->comp.task, 1);
+ rxe_run_task(&qp->comp.task);
else
__rxe_do_task(&qp->comp.task);
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task);
}
/* called by the modify qp verb */
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index c5d9b558fa90..a84c0407545b 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -126,7 +126,7 @@ void rnr_nak_timer(struct timer_list *t)
struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task);
}
static struct rxe_send_wqe *req_next_wqe(struct rxe_qp *qp)
@@ -651,7 +651,7 @@ int rxe_requester(void *arg)
}
if ((wqe->wr.send_flags & IB_SEND_SIGNALED) ||
qp->sq_sig_type == IB_SIGNAL_ALL_WR)
- rxe_run_task(&qp->comp.task, 1);
+ rxe_run_task(&qp->comp.task);
qp->req.wqe_index = next_index(qp->sq.queue,
qp->req.wqe_index);
goto next_wqe;
@@ -736,7 +736,7 @@ int rxe_requester(void *arg)
rollback_state(wqe, qp, &rollback_wqe, rollback_psn);
if (ret == -EAGAIN) {
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task);
goto exit;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 1cbfbd98eb22..d4b5535b8517 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -106,15 +106,9 @@ static char *resp_state_name[] = {
/* rxe_recv calls here to add a request packet to the input queue */
void rxe_resp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb)
{
- int must_sched;
- struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
-
skb_queue_tail(&qp->req_pkts, skb);
- must_sched = (pkt->opcode == IB_OPCODE_RC_RDMA_READ_REQUEST) ||
- (skb_queue_len(&qp->req_pkts) > 1);
-
- rxe_run_task(&qp->resp.task, must_sched);
+ rxe_run_task(&qp->resp.task);
}
static inline enum resp_states get_req(struct rxe_qp *qp,
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 08f05ac5f5d5..7c2b6e4595f5 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -151,15 +151,12 @@ void rxe_cleanup_task(struct rxe_task *task)
tasklet_kill(&task->tasklet);
}
-void rxe_run_task(struct rxe_task *task, int sched)
+void rxe_run_task(struct rxe_task *task)
{
if (task->destroyed)
return;
- if (sched)
- tasklet_schedule(&task->tasklet);
- else
- rxe_do_task((unsigned long)task);
+ tasklet_schedule(&task->tasklet);
}
void rxe_disable_task(struct rxe_task *task)
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index 08ff42d451c6..5c1fc7d5b953 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -82,10 +82,10 @@ int __rxe_do_task(struct rxe_task *task);
*/
void rxe_do_task(unsigned long data);
-/* run a task, else schedule it to run as a tasklet, The decision
- * to run or schedule tasklet is based on the parameter sched.
+/*
+ * schedule task to run as a tasklet.
*/
-void rxe_run_task(struct rxe_task *task, int sched);
+void rxe_run_task(struct rxe_task *task);
/* keep a task from scheduling */
void rxe_disable_task(struct rxe_task *task);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 4ebdfcf4d33e..f6b25b409d12 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -708,9 +708,9 @@ static int rxe_post_send_kernel(struct rxe_qp *qp, const struct ib_send_wr *wr,
wr = wr->next;
}
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task);
if (unlikely(qp->req.state == QP_STATE_ERROR))
- rxe_run_task(&qp->comp.task, 1);
+ rxe_run_task(&qp->comp.task);
return err;
}
@@ -732,7 +732,7 @@ static int rxe_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr,
if (qp->is_user) {
/* Utilize process context to do protocol processing */
- rxe_run_task(&qp->req.task, 0);
+ rxe_run_task(&qp->req.task);
return 0;
} else
return rxe_post_send_kernel(qp, wr, bad_wr);
@@ -772,7 +772,7 @@ static int rxe_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
spin_unlock_irqrestore(&rq->producer_lock, flags);
if (qp->resp.state == QP_STATE_ERROR)
- rxe_run_task(&qp->resp.task, 1);
+ rxe_run_task(&qp->resp.task);
err1:
return err;
--
2.20.1
^ permalink raw reply related
* [PATCH 00/10] Refactor rxe driver to remove multiple race conditions
From: Maksym Planeta @ 2019-07-22 15:14 UTC (permalink / raw)
To: Moni Shoua, Doug Ledford, Jason Gunthorpe, linux-rdma,
linux-kernel
Cc: Maksym Planeta
This patchset helps to get rid of following race condition situations:
1. Tasklet functions were incrementing reference counting after entering
running the tasklet.
2. Getting a pointer to reference counted object (kref) was done without
protecting kref_put with a lock.
3. QP cleanup was sometimes scheduling cleanup for later execution in
rxe_qp_do_cleaunpm, although this QP's memory could be freed immediately after
returning from rxe_qp_cleanup.
4. Non-atomic cleanup functions could be called in SoftIRQ context
5. Manipulating with reference counter inside a critical section could have
been done both inside and outside of SoftIRQ region. Such behavior may end up
in a deadlock.
The easiest way to observe these problems is to compile the kernel with KASAN
and lockdep and abruptly stop an application using SoftRoCE during the
communication phase. For my system this often resulted in kernel crash of a
deadlock inside the kernel.
To fix the above mentioned problems, this patch does following things:
1. Replace tasklets with workqueues
2. Adds locks to kref_put
3. Aquires reference counting in an appropriate place
As a shortcomming, the performance is slightly reduced, because instead of
trying to execute tasklet function directly the new version always puts it onto
the queue.
TBH, I'm not sure that I removed all of the problems, but the driver
deffinetely behaves much more stable now. I would be glad to get some
help with additional testing.
drivers/infiniband/sw/rxe/rxe_comp.c | 38 ++----
drivers/infiniband/sw/rxe/rxe_cq.c | 17 ++-
drivers/infiniband/sw/rxe/rxe_hw_counters.c | 1 -
drivers/infiniband/sw/rxe/rxe_hw_counters.h | 1 -
drivers/infiniband/sw/rxe/rxe_loc.h | 3 +-
drivers/infiniband/sw/rxe/rxe_mcast.c | 22 ++--
drivers/infiniband/sw/rxe/rxe_mr.c | 10 +-
drivers/infiniband/sw/rxe/rxe_net.c | 21 ++-
drivers/infiniband/sw/rxe/rxe_pool.c | 40 ++++--
drivers/infiniband/sw/rxe/rxe_pool.h | 16 ++-
drivers/infiniband/sw/rxe/rxe_qp.c | 130 +++++++++---------
drivers/infiniband/sw/rxe/rxe_recv.c | 8 +-
drivers/infiniband/sw/rxe/rxe_req.c | 17 +--
drivers/infiniband/sw/rxe/rxe_resp.c | 54 ++++----
drivers/infiniband/sw/rxe/rxe_task.c | 139 +++++++-------------
drivers/infiniband/sw/rxe/rxe_task.h | 40 ++----
drivers/infiniband/sw/rxe/rxe_verbs.c | 81 ++++++------
drivers/infiniband/sw/rxe/rxe_verbs.h | 8 +-
18 files changed, 302 insertions(+), 344 deletions(-)
--
2.20.1
^ permalink raw reply
* BUG: KASAN: null-ptr-deref in rdma_counter_get_hwstat_value+0x19d/0x260 in for-next branch
From: Gal Pressman @ 2019-07-22 15:10 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky, Mark Zhang, Doug Ledford; +Cc: linux-rdma
Hi,
I pulled the latest for-next branch (5.3-rc1) which includes the new stats stuff
and applied a patch to enable EFA stats [1], and I'm getting the following trace
[2]. The EFA patch isn't merged yet so it could cause some extra noise, but this
did not happen before the core statistics patches were merged.
From a quick look it seems that 'port_counter->hstats' is only initialized for
ports 1..num_ports (i.e not initialized for port 0, device stats) in
rdma_counter_init rdma_for_each_port loop.
As a result, rdma_counter_get_hwstat_value hits a NULL pointer dereference when
querying device statistics as it tries to access an uninitialized hstats field in:
sum += port_counter->hstats->value[index];
I'm thinking of adding a check similar to the one that exists in
counter_history_stat_update and return 0 in case of !port_counter->hstats.
What do you guys think?
[1] https://patchwork.kernel.org/patch/11034123/
[2] cat /sys/class/infiniband/efa_0/hw_counters/completed_cmds
[ 82.519451] ==================================================================
[ 82.522782] BUG: KASAN: null-ptr-deref in
rdma_counter_get_hwstat_value+0x19d/0x260 [ib_core]
[ 82.526374] Read of size 8 at addr 00000000000000d0 by task cat/14604
[ 82.530133] CPU: 44 PID: 14604 Comm: cat Tainted: G E
5.3.0-rc1-dirty #101
[ 82.533613] Hardware name: Amazon EC2 c5n.18xlarge/, BIOS 1.0 10/16/2017
[ 82.536505] Call Trace:
[ 82.537837] dump_stack+0x91/0xeb
[ 82.539487] __kasan_report+0x1be/0x220
[ 82.541396] ? rdma_counter_get_hwstat_value+0x19d/0x260 [ib_core]
[ 82.544206] ? rdma_counter_get_hwstat_value+0x19d/0x260 [ib_core]
[ 82.546965] kasan_report+0xe/0x20
[ 82.548659] rdma_counter_get_hwstat_value+0x19d/0x260 [ib_core]
[ 82.552753] ? rdma_counter_query_stats+0x70/0x70 [ib_core]
[ 82.556629] ? lock_acquire+0x100/0x260
[ 82.559905] show_hw_stats+0xdc/0x1d0 [ib_core]
[ 82.563420] dev_attr_show+0x34/0x70
[ 82.566588] sysfs_kf_seq_show+0x12b/0x1c0
[ 82.569917] ? device_match_of_node+0x30/0x30
[ 82.573355] seq_read+0x171/0x6d0
[ 82.576415] vfs_read+0xc9/0x1e0
[ 82.579409] ksys_read+0xca/0x180
[ 82.582443] ? kernel_write+0xb0/0xb0
[ 82.585618] ? trace_hardirqs_on_thunk+0x1a/0x20
[ 82.589119] ? mark_held_locks+0x25/0xc0
[ 82.592387] ? do_syscall_64+0x14/0x2b0
[ 82.595648] do_syscall_64+0x68/0x2b0
[ 82.598886] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 82.602612] RIP: 0033:0x7fa96127afe0
[ 82.605800] Code: 0b 31 c0 48 83 c4 08 e9 be fe ff ff 48 8d 3d 17 bf 09 00 e8
52 8a 02 00 66 90 83 3d bd cf 2d 00 00 75 10 b8 00 00 00 00 0f 05 <48> 3d 01 f0
ff ff 73 31 c3 48 83 ec 08 e8 4e cc 01 00 48 89 04 24
[ 82.617434] RSP: 002b:00007ffc04ceea48 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 82.623423] RAX: ffffffffffffffda RBX: 0000000000010000 RCX: 00007fa96127afe0
[ 82.629319] RDX: 0000000000010000 RSI: 0000000000ebf000 RDI: 0000000000000003
[ 82.635142] RBP: 0000000000ebf000 R08: 0000000000000000 R09: 0000000000010fff
[ 82.641030] R10: 00007ffc04cede20 R11: 0000000000000246 R12: 0000000000ebf000
[ 82.646915] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000fff
[ 82.652804] ==================================================================
Thanks
^ permalink raw reply
* [PATCH net-next] [net-next] mlx4: avoid large stack usage in mlx4_init_hca()
From: Arnd Bergmann @ 2019-07-22 15:01 UTC (permalink / raw)
To: Tariq Toukan, David S. Miller
Cc: Arnd Bergmann, Erez Alfasi, Jack Morgenstein, Eli Cohen,
Moshe Shemesh, Jiri Pirko, netdev, linux-rdma, linux-kernel,
clang-built-linux
The mlx4_dev_cap and mlx4_init_hca_param are really too large
to be put on the kernel stack, as shown by this clang warning:
drivers/net/ethernet/mellanox/mlx4/main.c:3304:12: error: stack frame size of 1088 bytes in function 'mlx4_load_one' [-Werror,-Wframe-larger-than=]
With gcc, the problem is the same, but it does not warn because
it does not inline this function, and therefore stays just below
the warning limit, while clang is just above it.
Use kzalloc for dynamic allocation instead of putting them
on stack. This gets the combined stack frame down to 424 bytes.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/net/ethernet/mellanox/mlx4/main.c | 66 +++++++++++++----------
1 file changed, 39 insertions(+), 27 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 1f6e16d5ea6b..07c204bd3fc4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2292,23 +2292,31 @@ static int mlx4_init_fw(struct mlx4_dev *dev)
static int mlx4_init_hca(struct mlx4_dev *dev)
{
struct mlx4_priv *priv = mlx4_priv(dev);
+ struct mlx4_init_hca_param *init_hca = NULL;
+ struct mlx4_dev_cap *dev_cap = NULL;
struct mlx4_adapter adapter;
- struct mlx4_dev_cap dev_cap;
struct mlx4_profile profile;
- struct mlx4_init_hca_param init_hca;
u64 icm_size;
struct mlx4_config_dev_params params;
int err;
if (!mlx4_is_slave(dev)) {
- err = mlx4_dev_cap(dev, &dev_cap);
+ dev_cap = kzalloc(sizeof(*dev_cap), GFP_KERNEL);
+ init_hca = kzalloc(sizeof(*init_hca), GFP_KERNEL);
+
+ if (!dev_cap || !init_hca) {
+ err = -ENOMEM;
+ goto out_free;
+ }
+
+ err = mlx4_dev_cap(dev, dev_cap);
if (err) {
mlx4_err(dev, "QUERY_DEV_CAP command failed, aborting\n");
- return err;
+ goto out_free;
}
- choose_steering_mode(dev, &dev_cap);
- choose_tunnel_offload_mode(dev, &dev_cap);
+ choose_steering_mode(dev, dev_cap);
+ choose_tunnel_offload_mode(dev, dev_cap);
if (dev->caps.dmfs_high_steer_mode == MLX4_STEERING_DMFS_A0_STATIC &&
mlx4_is_master(dev))
@@ -2331,48 +2339,48 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
MLX4_STEERING_MODE_DEVICE_MANAGED)
profile.num_mcg = MLX4_FS_NUM_MCG;
- icm_size = mlx4_make_profile(dev, &profile, &dev_cap,
- &init_hca);
+ icm_size = mlx4_make_profile(dev, &profile, dev_cap,
+ init_hca);
if ((long long) icm_size < 0) {
err = icm_size;
- return err;
+ goto out_free;
}
dev->caps.max_fmr_maps = (1 << (32 - ilog2(dev->caps.num_mpts))) - 1;
if (enable_4k_uar || !dev->persist->num_vfs) {
- init_hca.log_uar_sz = ilog2(dev->caps.num_uars) +
+ init_hca->log_uar_sz = ilog2(dev->caps.num_uars) +
PAGE_SHIFT - DEFAULT_UAR_PAGE_SHIFT;
- init_hca.uar_page_sz = DEFAULT_UAR_PAGE_SHIFT - 12;
+ init_hca->uar_page_sz = DEFAULT_UAR_PAGE_SHIFT - 12;
} else {
- init_hca.log_uar_sz = ilog2(dev->caps.num_uars);
- init_hca.uar_page_sz = PAGE_SHIFT - 12;
+ init_hca->log_uar_sz = ilog2(dev->caps.num_uars);
+ init_hca->uar_page_sz = PAGE_SHIFT - 12;
}
- init_hca.mw_enabled = 0;
+ init_hca->mw_enabled = 0;
if (dev->caps.flags & MLX4_DEV_CAP_FLAG_MEM_WINDOW ||
dev->caps.bmme_flags & MLX4_BMME_FLAG_TYPE_2_WIN)
- init_hca.mw_enabled = INIT_HCA_TPT_MW_ENABLE;
+ init_hca->mw_enabled = INIT_HCA_TPT_MW_ENABLE;
- err = mlx4_init_icm(dev, &dev_cap, &init_hca, icm_size);
+ err = mlx4_init_icm(dev, dev_cap, init_hca, icm_size);
if (err)
- return err;
+ goto out_free;
- err = mlx4_INIT_HCA(dev, &init_hca);
+ err = mlx4_INIT_HCA(dev, init_hca);
if (err) {
mlx4_err(dev, "INIT_HCA command failed, aborting\n");
goto err_free_icm;
}
- if (dev_cap.flags2 & MLX4_DEV_CAP_FLAG2_SYS_EQS) {
- err = mlx4_query_func(dev, &dev_cap);
+ if (dev_cap->flags2 & MLX4_DEV_CAP_FLAG2_SYS_EQS) {
+ err = mlx4_query_func(dev, dev_cap);
if (err < 0) {
mlx4_err(dev, "QUERY_FUNC command failed, aborting.\n");
goto err_close;
} else if (err & MLX4_QUERY_FUNC_NUM_SYS_EQS) {
- dev->caps.num_eqs = dev_cap.max_eqs;
- dev->caps.reserved_eqs = dev_cap.reserved_eqs;
- dev->caps.reserved_uars = dev_cap.reserved_uars;
+ dev->caps.num_eqs = dev_cap->max_eqs;
+ dev->caps.reserved_eqs = dev_cap->reserved_eqs;
+ dev->caps.reserved_uars = dev_cap->reserved_uars;
}
}
@@ -2381,14 +2389,13 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
* read HCA frequency by QUERY_HCA command
*/
if (dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS) {
- memset(&init_hca, 0, sizeof(init_hca));
- err = mlx4_QUERY_HCA(dev, &init_hca);
+ err = mlx4_QUERY_HCA(dev, init_hca);
if (err) {
mlx4_err(dev, "QUERY_HCA command failed, disable timestamp\n");
dev->caps.flags2 &= ~MLX4_DEV_CAP_FLAG2_TS;
} else {
dev->caps.hca_core_clock =
- init_hca.hca_core_clock;
+ init_hca->hca_core_clock;
}
/* In case we got HCA frequency 0 - disable timestamping
@@ -2464,7 +2471,8 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
priv->eq_table.inta_pin = adapter.inta_pin;
memcpy(dev->board_id, adapter.board_id, sizeof(dev->board_id));
- return 0;
+ err = 0;
+ goto out_free;
unmap_bf:
unmap_internal_clock(dev);
@@ -2483,6 +2491,10 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
if (!mlx4_is_slave(dev))
mlx4_free_icms(dev);
+out_free:
+ kfree(dev_cap);
+ kfree(init_hca);
+
return err;
}
--
2.20.0
^ permalink raw reply related
* Re: [PATCH for-next 0/3] {cxgb3, cxgb4, i40iw} Report phys_state
From: Jason Gunthorpe @ 2019-07-22 13:43 UTC (permalink / raw)
To: Kamal Heib
Cc: linux-rdma, Doug Ledford, Potnuri Bharat Teja, Faisal Latif,
Shiraz Saleem
In-Reply-To: <20190722070550.25395-1-kamalheib1@gmail.com>
On Mon, Jul 22, 2019 at 10:05:47AM +0300, Kamal Heib wrote:
> This series includes three patches that add the support for reporting
> physical state for the cxgb3, cxgb4, and i40iw drivers via sysfs.
>
> Kamal Heib (3):
> RDMA/cxgb3: Report phys_state in query_port
> RDMA/cxgb4: Report phys_state in query_port
> RDMA/i40iw: Report phys_state in query_port
>
> drivers/infiniband/hw/cxgb3/iwch_provider.c | 16 +++++++++++-----
> drivers/infiniband/hw/cxgb4/provider.c | 16 +++++++++++-----
> drivers/infiniband/hw/i40iw/i40iw_verbs.c | 7 +++++--
> 3 files changed, 27 insertions(+), 12 deletions(-)
Lets not have this generic iwapr code open coded please.
The core code already knows what the netdev is for the iWarp device.
Jason
^ permalink raw reply
* Re: rdma-core device memory leak
From: Jason Gunthorpe @ 2019-07-22 11:48 UTC (permalink / raw)
To: Gal Pressman
Cc: RDMA mailing list, Doug Ledford, Leon Romanovsky, Maor Gottlieb
In-Reply-To: <9c250b8c-df24-7491-deda-ede0b72fd689@amazon.com>
On Mon, Jul 22, 2019 at 11:10:51AM +0300, Gal Pressman wrote:
> Hi all,
>
> I'm seeing memory leaks when running tests with valgrind memcheck tool [1]. It
> seems like it's caused due to verbs_device refcount never reaching zero.
>
> Last related commit is 8125fdeb69bb ("verbs: Avoid ibv_device memory leak"),
> which seems like it should prevent this issue - but I'm not sure it covers all
> cases.
>
> When calling ibv_get_device_list, try_driver will eventually get called and set
> the device refcount to one. The refcount for each device will be increased when
> iterating the devices list, and on each verbs_init_context call.
>
> In the free flow, the refcount is decreased on verbs_uninit_context and when
> iterating the devices list - which brings the refcount back to one, as initially
> set by try_driver (hence uninit_device isn't called).
It is supposed to cache the device list in the library
(device.:device_list) and there is no function to cleanup the cache to
silence the valgrind warnings.
Jason
^ permalink raw reply
* Re: rdma-core device memory leak
From: Leon Romanovsky @ 2019-07-22 11:38 UTC (permalink / raw)
To: Gal Pressman
Cc: RDMA mailing list, Jason Gunthorpe, Doug Ledford, Maor Gottlieb
In-Reply-To: <394ed0b6-9f77-d951-5315-fe496d72f9d1@amazon.com>
On Mon, Jul 22, 2019 at 02:21:18PM +0300, Gal Pressman wrote:
> On 22/07/2019 12:15, Leon Romanovsky wrote:
> > On Mon, Jul 22, 2019 at 11:10:51AM +0300, Gal Pressman wrote:
> >> Hi all,
> >>
> >> I'm seeing memory leaks when running tests with valgrind memcheck tool [1]. It
> >> seems like it's caused due to verbs_device refcount never reaching zero.
> >>
> >> Last related commit is 8125fdeb69bb ("verbs: Avoid ibv_device memory leak"),
> >> which seems like it should prevent this issue - but I'm not sure it covers all
> >> cases.
> >>
> >> When calling ibv_get_device_list, try_driver will eventually get called and set
> >> the device refcount to one. The refcount for each device will be increased when
> >> iterating the devices list, and on each verbs_init_context call.
> >>
> >> In the free flow, the refcount is decreased on verbs_uninit_context and when
> >> iterating the devices list - which brings the refcount back to one, as initially
> >> set by try_driver (hence uninit_device isn't called).
> >>
> >> Is there a reason for initializing refcount to one instead of zero? According to
> >> the cited commit the reference count should be decreased when the device no
> >> longer exists in the sysfs, but the device isn't necessarily removed from the sysfs.
> >
> > Such scheme allows us to avoid rdma-core provider reinitialization every
> > time application "plays" with ibv_get_device_list(). Anyway, the rdma-core
> > library (libibverbs) won't be unloaded till dclose() is called and glibc
> > reference count won't reach zero, so we don't need to release provider
> > till that point of time too.
>
> So you consider these valgrind errors false alarms?
Yes, valgrind checks executed code and unlikely to check unload sequence.
In your case, the unload code wasn't called at all.
Thanks
^ permalink raw reply
* Re: rdma-core device memory leak
From: Gal Pressman @ 2019-07-22 11:21 UTC (permalink / raw)
To: Leon Romanovsky
Cc: RDMA mailing list, Jason Gunthorpe, Doug Ledford, Maor Gottlieb
In-Reply-To: <20190722091523.GC5125@mtr-leonro.mtl.com>
On 22/07/2019 12:15, Leon Romanovsky wrote:
> On Mon, Jul 22, 2019 at 11:10:51AM +0300, Gal Pressman wrote:
>> Hi all,
>>
>> I'm seeing memory leaks when running tests with valgrind memcheck tool [1]. It
>> seems like it's caused due to verbs_device refcount never reaching zero.
>>
>> Last related commit is 8125fdeb69bb ("verbs: Avoid ibv_device memory leak"),
>> which seems like it should prevent this issue - but I'm not sure it covers all
>> cases.
>>
>> When calling ibv_get_device_list, try_driver will eventually get called and set
>> the device refcount to one. The refcount for each device will be increased when
>> iterating the devices list, and on each verbs_init_context call.
>>
>> In the free flow, the refcount is decreased on verbs_uninit_context and when
>> iterating the devices list - which brings the refcount back to one, as initially
>> set by try_driver (hence uninit_device isn't called).
>>
>> Is there a reason for initializing refcount to one instead of zero? According to
>> the cited commit the reference count should be decreased when the device no
>> longer exists in the sysfs, but the device isn't necessarily removed from the sysfs.
>
> Such scheme allows us to avoid rdma-core provider reinitialization every
> time application "plays" with ibv_get_device_list(). Anyway, the rdma-core
> library (libibverbs) won't be unloaded till dclose() is called and glibc
> reference count won't reach zero, so we don't need to release provider
> till that point of time too.
So you consider these valgrind errors false alarms?
^ permalink raw reply
* Re: [PATCH 2/3] net/xdp: convert put_page() to put_user_page*()
From: Christoph Hellwig @ 2019-07-22 9:34 UTC (permalink / raw)
To: john.hubbard
Cc: Andrew Morton, Alexander Viro, Björn Töpel,
Boaz Harrosh, Christoph Hellwig, Daniel Vetter, Dan Williams,
Dave Chinner, David Airlie, David S . Miller, Ilya Dryomov,
Jan Kara, Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
Johannes Thumshirn, Magnus Karlsson, Matthew Wilcox,
Miklos Szeredi, Ming Lei, Sage Weil, Santosh Shilimkar, Yan Zheng,
netdev, dri-devel, linux-mm, linux-rdma, bpf, LKML, John Hubbard
In-Reply-To: <20190722043012.22945-3-jhubbard@nvidia.com>
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 83de74ca729a..9cbbb96c2a32 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -171,8 +171,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem *umem)
> for (i = 0; i < umem->npgs; i++) {
> struct page *page = umem->pgs[i];
>
> - set_page_dirty_lock(page);
> - put_page(page);
> + put_user_pages_dirty_lock(&page, 1);
Same here, we really should avoid the need for the loop here and
do the looping inside the helper.
^ permalink raw reply
* Re: [PATCH 1/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()
From: Christoph Hellwig @ 2019-07-22 9:33 UTC (permalink / raw)
To: john.hubbard
Cc: Andrew Morton, Alexander Viro, Björn Töpel,
Boaz Harrosh, Christoph Hellwig, Daniel Vetter, Dan Williams,
Dave Chinner, David Airlie, David S . Miller, Ilya Dryomov,
Jan Kara, Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
Johannes Thumshirn, Magnus Karlsson, Matthew Wilcox,
Miklos Szeredi, Ming Lei, Sage Weil, Santosh Shilimkar, Yan Zheng,
netdev, dri-devel, linux-mm, linux-rdma, bpf, LKML, John Hubbard
In-Reply-To: <20190722043012.22945-2-jhubbard@nvidia.com>
On Sun, Jul 21, 2019 at 09:30:10PM -0700, john.hubbard@gmail.com wrote:
> for (i = 0; i < vsg->num_pages; ++i) {
> if (NULL != (page = vsg->pages[i])) {
> if (!PageReserved(page) && (DMA_FROM_DEVICE == vsg->direction))
> - SetPageDirty(page);
> - put_page(page);
> + put_user_pages_dirty(&page, 1);
> + else
> + put_user_page(page);
> }
Can't just pass a dirty argument to put_user_pages? Also do we really
need a separate put_user_page for the single page case?
put_user_pages_dirty?
Also the PageReserved check looks bogus, as I can't see how a reserved
page can end up here. So IMHO the above snippled should really look
something like this:
put_user_pages(vsg->pages[i], vsg->num_pages,
vsg->direction == DMA_FROM_DEVICE);
in the end.
^ permalink raw reply
* Re: rdma-core device memory leak
From: Leon Romanovsky @ 2019-07-22 9:15 UTC (permalink / raw)
To: Gal Pressman
Cc: RDMA mailing list, Jason Gunthorpe, Doug Ledford, Maor Gottlieb
In-Reply-To: <9c250b8c-df24-7491-deda-ede0b72fd689@amazon.com>
On Mon, Jul 22, 2019 at 11:10:51AM +0300, Gal Pressman wrote:
> Hi all,
>
> I'm seeing memory leaks when running tests with valgrind memcheck tool [1]. It
> seems like it's caused due to verbs_device refcount never reaching zero.
>
> Last related commit is 8125fdeb69bb ("verbs: Avoid ibv_device memory leak"),
> which seems like it should prevent this issue - but I'm not sure it covers all
> cases.
>
> When calling ibv_get_device_list, try_driver will eventually get called and set
> the device refcount to one. The refcount for each device will be increased when
> iterating the devices list, and on each verbs_init_context call.
>
> In the free flow, the refcount is decreased on verbs_uninit_context and when
> iterating the devices list - which brings the refcount back to one, as initially
> set by try_driver (hence uninit_device isn't called).
>
> Is there a reason for initializing refcount to one instead of zero? According to
> the cited commit the reference count should be decreased when the device no
> longer exists in the sysfs, but the device isn't necessarily removed from the sysfs.
Such scheme allows us to avoid rdma-core provider reinitialization every
time application "plays" with ibv_get_device_list(). Anyway, the rdma-core
library (libibverbs) won't be unloaded till dclose() is called and glibc
reference count won't reach zero, so we don't need to release provider
till that point of time too.
Thanks
^ permalink raw reply
* rdma-core device memory leak
From: Gal Pressman @ 2019-07-22 8:10 UTC (permalink / raw)
To: RDMA mailing list
Cc: Jason Gunthorpe, Doug Ledford, Leon Romanovsky, Maor Gottlieb
Hi all,
I'm seeing memory leaks when running tests with valgrind memcheck tool [1]. It
seems like it's caused due to verbs_device refcount never reaching zero.
Last related commit is 8125fdeb69bb ("verbs: Avoid ibv_device memory leak"),
which seems like it should prevent this issue - but I'm not sure it covers all
cases.
When calling ibv_get_device_list, try_driver will eventually get called and set
the device refcount to one. The refcount for each device will be increased when
iterating the devices list, and on each verbs_init_context call.
In the free flow, the refcount is decreased on verbs_uninit_context and when
iterating the devices list - which brings the refcount back to one, as initially
set by try_driver (hence uninit_device isn't called).
Is there a reason for initializing refcount to one instead of zero? According to
the cited commit the reference count should be decreased when the device no
longer exists in the sysfs, but the device isn't necessarily removed from the sysfs.
[1]
==35758== HEAP SUMMARY:
==35758== in use at exit: 27,777 bytes in 88 blocks
==35758== total heap usage: 295 allocs, 207 frees, 141,751 bytes allocated
==35758==
==35758== 728 bytes in 1 blocks are possibly lost in loss record 3 of 8
==35758== at 0x4C2A935: calloc (vg_replace_malloc.c:711)
==35758== by 0x6FF263F: efa_device_alloc (efa.c:161)
==35758== by 0x4E42B67: try_driver (init.c:365)
==35758== by 0x4E42DBE: try_drivers (init.c:429)
==35758== by 0x4E42DBE: try_all_drivers (init.c:519)
==35758== by 0x4E43798: ibverbs_get_device_list (init.c:584)
==35758== by 0x4E40870: ibv_get_device_list@@IBVERBS_1.1 (device.c:74)
==35758== by 0x400691: main (device_list.c:46)
==35758==
==35758== 1,048 bytes in 1 blocks are possibly lost in loss record 6 of 8
==35758== at 0x4C2A935: calloc (vg_replace_malloc.c:711)
==35758== by 0x4E425C5: find_sysfs_devs_nl_cb (ibdev_nl.c:156)
==35758== by 0x5697E4B: nl_recvmsgs_report (in /usr/lib64/libnl-3.so.200.23.0)
==35758== by 0x56982B8: nl_recvmsgs (in /usr/lib64/libnl-3.so.200.23.0)
==35758== by 0x4E4734F: rdmanl_get_devices (rdma_nl.c:96)
==35758== by 0x4E42726: find_sysfs_devs_nl (ibdev_nl.c:205)
==35758== by 0x4E43501: ibverbs_get_device_list (init.c:538)
==35758== by 0x4E40870: ibv_get_device_list@@IBVERBS_1.1 (device.c:74)
==35758== by 0x400691: main (device_list.c:46)
Thanks,
Gal
^ permalink raw reply
* Re: [PATCH 2/8] scsi: take the DMA max mapping size into account
From: Damien Le Moal @ 2019-07-22 7:40 UTC (permalink / raw)
To: Ming Lei, Bart van Assche
Cc: Christoph Hellwig, Martin K . Petersen, Sagi Grimberg,
Max Gurtovoy, linux-rdma, Linux SCSI List,
megaraidlinux.pdl@broadcom.com, MPT-FusionLinux.pdl@broadcom.com,
linux-hyperv@vger.kernel.org, Linux Kernel Mailing List
In-Reply-To: <CACVXFVMWM3xg6EEyoyNjkLPv=8+wQKiHj6erMS_gGX25f-Ot4g@mail.gmail.com>
On 2019/07/22 15:01, Ming Lei wrote:
> On Tue, Jun 18, 2019 at 4:57 AM Bart Van Assche <bvanassche@acm.org> wrote:
>>
>> On 6/17/19 5:19 AM, Christoph Hellwig wrote:
>>> We need to limit the devices max_sectors to what the DMA mapping
>>> implementation can support. If not we risk running out of swiotlb
>>> buffers easily.
>>>
>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>> ---
>>> drivers/scsi/scsi_lib.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>> index d333bb6b1c59..f233bfd84cd7 100644
>>> --- a/drivers/scsi/scsi_lib.c
>>> +++ b/drivers/scsi/scsi_lib.c
>>> @@ -1768,6 +1768,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
>>> blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
>>> }
>>>
>>> + shost->max_sectors = min_t(unsigned int, shost->max_sectors,
>>> + dma_max_mapping_size(dev) << SECTOR_SHIFT);
>>> blk_queue_max_hw_sectors(q, shost->max_sectors);
>>> if (shost->unchecked_isa_dma)
>>> blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);
>>
>> Does dma_max_mapping_size() return a value in bytes? Is
>> shost->max_sectors a number of sectors? If so, are you sure that "<<
>> SECTOR_SHIFT" is the proper conversion? Shouldn't that be ">>
>> SECTOR_SHIFT" instead?
>
> Now the patch has been committed, '<< SECTOR_SHIFT' needs to be fixed.
>
> Also the following kernel oops is triggered on qemu, and looks
> device->dma_mask is NULL.
Just hit the exact same problem using tcmu-runner (ZBC file handler) on bare
metal (no QEMU). dev->dma_mask is NULL. No problem with real disks though.
>
> [ 5.826483] scsi host0: Virtio SCSI HBA
> [ 5.829302] st: Version 20160209, fixed bufsize 32768, s/g segs 256
> [ 5.831042] SCSI Media Changer driver v0.25
> [ 5.832491] ==================================================================
> [ 5.833332] BUG: KASAN: null-ptr-deref in
> dma_direct_max_mapping_size+0x30/0x94
> [ 5.833332] Read of size 8 at addr 0000000000000000 by task kworker/u17:0/7
> [ 5.835506] nvme nvme0: pci function 0000:00:07.0
> [ 5.833332]
> [ 5.833332] CPU: 2 PID: 7 Comm: kworker/u17:0 Not tainted 5.3.0-rc1 #1328
> [ 5.836999] ahci 0000:00:1f.2: version 3.0
> [ 5.833332] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS ?-20180724_192412-buildhw-07.phx4
> [ 5.833332] Workqueue: events_unbound async_run_entry_fn
> [ 5.833332] Call Trace:
> [ 5.833332] dump_stack+0x6f/0x9d
> [ 5.833332] ? dma_direct_max_mapping_size+0x30/0x94
> [ 5.833332] __kasan_report+0x161/0x189
> [ 5.833332] ? dma_direct_max_mapping_size+0x30/0x94
> [ 5.833332] kasan_report+0xe/0x12
> [ 5.833332] dma_direct_max_mapping_size+0x30/0x94
> [ 5.833332] __scsi_init_queue+0xd8/0x1f3
> [ 5.833332] scsi_mq_alloc_queue+0x62/0x89
> [ 5.833332] scsi_alloc_sdev+0x38c/0x479
> [ 5.833332] scsi_probe_and_add_lun+0x22d/0x1093
> [ 5.833332] ? kobject_set_name_vargs+0xa4/0xb2
> [ 5.833332] ? mutex_lock+0x88/0xc4
> [ 5.833332] ? scsi_free_host_dev+0x4a/0x4a
> [ 5.833332] ? _raw_spin_lock_irqsave+0x8c/0xde
> [ 5.833332] ? _raw_write_unlock_irqrestore+0x23/0x23
> [ 5.833332] ? ata_tdev_match+0x22/0x45
> [ 5.833332] ? attribute_container_add_device+0x160/0x17e
> [ 5.833332] ? rpm_resume+0x26a/0x7c0
> [ 5.833332] ? kobject_get+0x12/0x43
> [ 5.833332] ? rpm_put_suppliers+0x7e/0x7e
> [ 5.833332] ? _raw_spin_lock_irqsave+0x8c/0xde
> [ 5.833332] ? _raw_write_unlock_irqrestore+0x23/0x23
> [ 5.833332] ? scsi_target_destroy+0x135/0x135
> [ 5.833332] __scsi_scan_target+0x14b/0x6aa
> [ 5.833332] ? pvclock_clocksource_read+0xc0/0x14e
> [ 5.833332] ? scsi_add_device+0x20/0x20
> [ 5.833332] ? rpm_resume+0x1ae/0x7c0
> [ 5.833332] ? rpm_put_suppliers+0x7e/0x7e
> [ 5.833332] ? _raw_spin_lock_irqsave+0x8c/0xde
> [ 5.833332] ? _raw_write_unlock_irqrestore+0x23/0x23
> [ 5.833332] ? pick_next_task_fair+0x976/0xa3d
> [ 5.833332] ? mutex_lock+0x88/0xc4
> [ 5.833332] scsi_scan_channel+0x76/0x9e
> [ 5.833332] scsi_scan_host_selected+0x131/0x176
> [ 5.833332] ? scsi_scan_host+0x241/0x241
> [ 5.833332] do_scan_async+0x27/0x219
> [ 5.833332] ? scsi_scan_host+0x241/0x241
> [ 5.833332] async_run_entry_fn+0xdc/0x23d
> [ 5.833332] process_one_work+0x327/0x539
> [ 5.833332] worker_thread+0x330/0x492
> [ 5.833332] ? rescuer_thread+0x41f/0x41f
> [ 5.833332] kthread+0x1c6/0x1d5
> [ 5.833332] ? kthread_park+0xd3/0xd3
> [ 5.833332] ret_from_fork+0x1f/0x30
> [ 5.833332] ==================================================================
>
>
>
> Thanks,
> Ming Lei
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply
* [PATCH for-next 3/3] RDMA/i40iw: Report phys_state in query_port
From: Kamal Heib @ 2019-07-22 7:05 UTC (permalink / raw)
To: linux-rdma
Cc: Doug Ledford, Jason Gunthorpe, Potnuri Bharat Teja, Faisal Latif,
Shiraz Saleem, Kamal Heib
In-Reply-To: <20190722070550.25395-1-kamalheib1@gmail.com>
Add support for reporting physical state when calling query_port() via
the sysfs interface.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
---
drivers/infiniband/hw/i40iw/i40iw_verbs.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
index d169a8031375..c678b05bb829 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -105,10 +105,13 @@ static int i40iw_query_port(struct ib_device *ibdev,
props->active_mtu = ib_mtu_int_to_enum(netdev->mtu);
props->lid = 1;
- if (netif_carrier_ok(iwdev->netdev))
+ if (netif_carrier_ok(iwdev->netdev)) {
props->state = IB_PORT_ACTIVE;
- else
+ props->phys_state = 5;
+ } else {
props->state = IB_PORT_DOWN;
+ props->phys_state = 3;
+ }
props->port_cap_flags = IB_PORT_CM_SUP | IB_PORT_REINIT_SUP |
IB_PORT_VENDOR_CLASS_SUP | IB_PORT_BOOT_MGMT_SUP;
props->gid_tbl_len = 1;
--
2.20.1
^ permalink raw reply related
* [PATCH for-next 2/3] RDMA/cxgb4: Report phys_state in query_port
From: Kamal Heib @ 2019-07-22 7:05 UTC (permalink / raw)
To: linux-rdma
Cc: Doug Ledford, Jason Gunthorpe, Potnuri Bharat Teja, Faisal Latif,
Shiraz Saleem, Kamal Heib
In-Reply-To: <20190722070550.25395-1-kamalheib1@gmail.com>
Add support for reporting physical state when calling query_port() via
the sysfs interface.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
---
drivers/infiniband/hw/cxgb4/provider.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 5e59c5708729..d38c242a1693 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -317,18 +317,24 @@ static int c4iw_query_port(struct ib_device *ibdev, u8 port,
props->max_mtu = IB_MTU_4096;
props->active_mtu = ib_mtu_int_to_enum(netdev->mtu);
- if (!netif_carrier_ok(netdev))
+ if (!netif_carrier_ok(netdev)) {
props->state = IB_PORT_DOWN;
- else {
+ props->phys_state = 3;
+ } else {
inetdev = in_dev_get(netdev);
if (inetdev) {
- if (inetdev->ifa_list)
+ if (inetdev->ifa_list) {
props->state = IB_PORT_ACTIVE;
- else
+ props->phys_state = 5;
+ } else {
props->state = IB_PORT_INIT;
+ props->phys_state = 4;
+ }
in_dev_put(inetdev);
- } else
+ } else {
props->state = IB_PORT_INIT;
+ props->phys_state = 4;
+ }
}
props->port_cap_flags =
--
2.20.1
^ permalink raw reply related
* [PATCH for-next 1/3] RDMA/cxgb3: Report phys_state in query_port
From: Kamal Heib @ 2019-07-22 7:05 UTC (permalink / raw)
To: linux-rdma
Cc: Doug Ledford, Jason Gunthorpe, Potnuri Bharat Teja, Faisal Latif,
Shiraz Saleem, Kamal Heib
In-Reply-To: <20190722070550.25395-1-kamalheib1@gmail.com>
Add support for reporting physical state when calling query_port() via
the sysfs interface.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
---
drivers/infiniband/hw/cxgb3/iwch_provider.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index e775c1a1a450..7b0b7bfc570a 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1004,18 +1004,24 @@ static int iwch_query_port(struct ib_device *ibdev,
props->max_mtu = IB_MTU_4096;
props->active_mtu = ib_mtu_int_to_enum(netdev->mtu);
- if (!netif_carrier_ok(netdev))
+ if (!netif_carrier_ok(netdev)) {
props->state = IB_PORT_DOWN;
- else {
+ props->phys_state = 3;
+ } else {
inetdev = in_dev_get(netdev);
if (inetdev) {
- if (inetdev->ifa_list)
+ if (inetdev->ifa_list) {
props->state = IB_PORT_ACTIVE;
- else
+ props->phys_state = 5;
+ } else {
props->state = IB_PORT_INIT;
+ props->phys_state = 4;
+ }
in_dev_put(inetdev);
- } else
+ } else {
props->state = IB_PORT_INIT;
+ props->phys_state = 4;
+ }
}
props->port_cap_flags =
--
2.20.1
^ permalink raw reply related
* [PATCH for-next 0/3] {cxgb3, cxgb4, i40iw} Report phys_state
From: Kamal Heib @ 2019-07-22 7:05 UTC (permalink / raw)
To: linux-rdma
Cc: Doug Ledford, Jason Gunthorpe, Potnuri Bharat Teja, Faisal Latif,
Shiraz Saleem, Kamal Heib
This series includes three patches that add the support for reporting
physical state for the cxgb3, cxgb4, and i40iw drivers via sysfs.
Kamal Heib (3):
RDMA/cxgb3: Report phys_state in query_port
RDMA/cxgb4: Report phys_state in query_port
RDMA/i40iw: Report phys_state in query_port
drivers/infiniband/hw/cxgb3/iwch_provider.c | 16 +++++++++++-----
drivers/infiniband/hw/cxgb4/provider.c | 16 +++++++++++-----
drivers/infiniband/hw/i40iw/i40iw_verbs.c | 7 +++++--
3 files changed, 27 insertions(+), 12 deletions(-)
--
2.20.1
^ permalink raw reply
* Re: [PATCH 2/8] scsi: take the DMA max mapping size into account
From: Dexuan-Linux Cui @ 2019-07-22 6:18 UTC (permalink / raw)
To: Ming Lei
Cc: Bart Van Assche, Christoph Hellwig, Martin K . Petersen,
Sagi Grimberg, Max Gurtovoy, linux-rdma, Linux SCSI List,
megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
Linux Kernel Mailing List, Dexuan Cui, v-lide
In-Reply-To: <CACVXFVMWM3xg6EEyoyNjkLPv=8+wQKiHj6erMS_gGX25f-Ot4g@mail.gmail.com>
On Sun, Jul 21, 2019 at 11:01 PM Ming Lei <tom.leiming@gmail.com> wrote:
>
> On Tue, Jun 18, 2019 at 4:57 AM Bart Van Assche <bvanassche@acm.org> wrote:
> >
> > On 6/17/19 5:19 AM, Christoph Hellwig wrote:
> > > We need to limit the devices max_sectors to what the DMA mapping
> > > implementation can support. If not we risk running out of swiotlb
> > > buffers easily.
> > >
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > > drivers/scsi/scsi_lib.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index d333bb6b1c59..f233bfd84cd7 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -1768,6 +1768,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
> > > blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
> > > }
> > >
> > > + shost->max_sectors = min_t(unsigned int, shost->max_sectors,
> > > + dma_max_mapping_size(dev) << SECTOR_SHIFT);
> > > blk_queue_max_hw_sectors(q, shost->max_sectors);
> > > if (shost->unchecked_isa_dma)
> > > blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);
> >
> > Does dma_max_mapping_size() return a value in bytes? Is
> > shost->max_sectors a number of sectors? If so, are you sure that "<<
> > SECTOR_SHIFT" is the proper conversion? Shouldn't that be ">>
> > SECTOR_SHIFT" instead?
>
> Now the patch has been committed, '<< SECTOR_SHIFT' needs to be fixed.
>
> Also the following kernel oops is triggered on qemu, and looks
> device->dma_mask is NULL.
>
> Ming Lei
FYI: we also see the panic with a Linux kernel 5.2.0-next-20190719
running on Hyper-V:
[ 7.429053] RIP: 0010:dma_direct_max_mapping_size+0x26/0x80
[ 7.429053] Code: 0f b6 c0 c3 0f 1f 44 00 00 55 48 89 e5 41 54 53
48 89 fb e8 4c 14 00 00 84 c0 74 45 48 8b 83 28 02 00 00 4c 8b a3 38
02 00 00 <48> 8b 00 48 85 c0 74 0c 4d 85 e4 74 36 49 39 c4 4c 0f 47 e0
48 89
[ 7.429053] RSP: 0018:ffffc1d5005efbc0 EFLAGS: 00010202
[ 7.429053] RAX: 0000000000000000 RBX: ffff9cf86d24c428 RCX: 0000000000000000
[ 7.429053] RDX: ffff9cf86d12dd00 RSI: 0000000000000200 RDI: ffff9cf86d24c428
[ 7.429053] RBP: ffffc1d5005efbd0 R08: ffff9cf86fcaf0e0 R09: ffff9cf86e0072c0
[ 7.429053] R10: ffffc1d5005efa70 R11: 00000000000301a0 R12: 0000000000000000
[ 7.429053] R13: ffff9cf86d24c428 R14: 0000000000000400 R15: ffff9cf825cff000
[ 7.429053] FS: 0000000000000000(0000) GS:ffff9cf86fc80000(0000)
knlGS:0000000000000000
[ 7.429053] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7.429053] CR2: 0000000000000000 CR3: 00000003c700a001 CR4: 00000000003606e0
[ 7.456569] NET: Registered protocol family 17
[ 7.429053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7.469803] Key type dns_resolver registered
[ 7.429053] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7.429053] Call Trace:
[ 7.429053] dma_max_mapping_size+0x39/0x50
[ 7.429053] __scsi_init_queue+0x7f/0x140
[ 7.429053] scsi_mq_alloc_queue+0x38/0x60
[ 7.429053] scsi_alloc_sdev+0x1da/0x2b0
[ 7.429053] scsi_probe_and_add_lun+0x471/0xe60
[ 7.429053] __scsi_scan_target+0xfc/0x610
[ 7.429053] scsi_scan_channel+0x66/0xa0
[ 7.429053] scsi_scan_host_selected+0xf3/0x160
[ 7.429053] do_scsi_scan_host+0x93/0xa0
[ 7.429053] do_scan_async+0x1c/0x190
[ 7.429053] async_run_entry_fn+0x3c/0x150
[ 7.429053] process_one_work+0x1f7/0x3f0
[ 7.429053] worker_thread+0x34/0x400
[ 7.429053] kthread+0x121/0x140
[ 7.429053] ret_from_fork+0x35/0x40
[ 7.429053] Modules linked in:
[ 7.429053] CR2: 0000000000000000
[ 7.766122] BUG: kernel NULL pointer dereference, address: 0000000000000000
Thanks,
-- Dexuan
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox