* [PATCH v5 0/1] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path
@ 2010-08-17 20:36 Ralph Campbell
[not found] ` <20100817203619.22174.62871.stgit-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Ralph Campbell @ 2010-08-17 20:36 UTC (permalink / raw)
To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hopefully, this is the last update needed to this patch and
it can be applied. It has been in use within QLogic for several months
without problems.
Changes from v4 to v5:
Removed the "break;" in ipoib_cm_flush_path() so now the function
comment matches the code.
Updated the error case code in ipoib_cm_tx_start() so the address
handle and cached GID are reset so the next call to ipoib_start_xmit()
will create a new CM connection.
Added netif_tx_lock_bh() in ipoib_neigh_cleanup()
in order to call ipoib_cm_destroy_tx().
I don't think it is strictly needed, I'm just being paranoid and I
don't think it is a performance path when deleting struct neighbour.
=================
These are my notes on the IPoIB locking and what I figured out
in the process of creating the patch that follows.
IPoIB connected mode uses a separate QP for transmit and receive.
I will only talk about the transmit side although the some of the
data structures are used by both.
The executive summary for locking is that most things are protected
by the struct ipoib_dev_priv.lock. The network device lock,
netif_tx_lock(), netif_tx_lock_bh(), or stopping network output via
netif_stop_queue() is used to prevent ipoib_start_xmit() and
ipoib_cm_send() from being called which can access struct ipoib_neigh
and struct ipoib_cm_tx without holding locks.
struct sk_buff {
// This pointer may hold a reference (see SKB_DST_NOREF).
// IPoIB doesn't change this pointer so the locking rules aren't important.
unsigned long _skb_refdst;
}
struct dst_entry {
// The neighbour pointer holds a reference.
// IPoIB doesn't change this pointer so the locking rules aren't important.
struct neighbour *neighbour;
atomic_t refcnt;
}
struct neighbour {
// stores the IPoIB "hardware" address:
// (control flags (one byte), QPN (3 bytes), destination GID (16 bytes),
// padding (0 or 4 bytes), and a pointer to struct ipoib_neigh (4 or 8 bytes)
// which is not reference counted.
// It is protected by calling netif_tx_lock(dev) or netif_stop_queue(dev).
// The Linux network stack can free the ipoib_neigh pointer by calling
// ipoib_neigh_cleanup()
uchar ha[];
struct neigh_ops *ops;
atomic_t refcnt;
}
struct ipoib_neigh {
// This is protected by priv->lock and *does not* hold a reference
struct neighbour *neighbour; // back pointer to containing struct
// This is protected by priv->lock although it is accessed w/o
// holding the priv->lock in ipoib_start_xmit() which means that
// to clear the pointer, ipoib_start_xmit() has to be prevented from
// being called if there is a chance that
// "to_ipoib_neigh(skb_dst(skb)->neighbour)" could point to this struct.
struct ipoib_cm_tx *cm;
// This is protected by priv->lock and holds a reference
struct ipoib_ah *ah;
// Link for path->neigh_list or mcast->neigh_list.
// This is protected by priv->lock.
struct list_head list;
}
struct ipoib_cm_tx {
// This is protected by priv->lock.
struct ipoib_neigh *neigh;
}
struct ipoib_path {
struct ib_sa_query *query; // non-NULL if SA path record query is pending
// This is protected by priv->lock and holds a reference
struct ipoib_ah *ah;
struct completion done; // True if query is finished
// list of all struct ipoib_neigh.list with the same ah pointer
struct list_head neigh_list;
}
struct ipoib_dev_priv {
// This lock protects a number of things associated with this
// IPoIB network device.
spinlock_t lock;
// Contains the struct ipoib_mcast nodes indexed by MGID.
// It is protected by priv->lock.
struct rb_root multicast_tree;
}
Before any nodes can send IPoIB packets, the SA creates the
"all HCA multicast group GID" which encodes <flag>, <scope>,
<IPv4/IPv6>, <pkey>. For example, transient, link local scope, IPv4,
pkey of 0x8001, limited broadcast address we have:
FF12:401B:8001::FFFF:FFFF, in compressed format.
The group also has a P_Key, Q_Key, and MTU associated with it in the SA
path record.
This multicast group is used for IPv4 address resolution (ARP).
The group is joined when the IPoIB device is brought up and the details
saved in priv->broadcast.
The transmit process starts with the Linux networking code calling
netif_tx_lock(dev) and then calling:
ipoib_hard_header()
// This prepends the "hardware address" onto the sk_buff header if
// the neighbor address hasn't been set in the sk_buff.
ipoib_start_xmit(skb, dev)
// The first time through, skb_dst(skb)->neighbour won't be set and
// ipoib_hard_header() will have prepended the IPv4 broadcast
// "hardware address" which we strip off and call the following.
ipoib_mcast_send(dev, mgid, skb)
spin_lock_irqsave(&priv->lock, flags)
// Search for the mgid and create an entry if not found.
mcast = __ipoib_mcast_find(dev, mgid)
// Since mgid is probably the "all HCA multicast group GID" which
// was initialized when the network interface was started, we call:
spin_unlock_irqrestore(&priv->lock, flags)
// XXX bug? using mcast->ah after unlock w/o incref on ah?
ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN)
// Eventually, we get an ARP reply, the struct neighbour.ha[] is filled
// in with the destination's "hardware address" and ipoib_start_xmit()
// is called again:
ipoib_start_xmit(skb, dev)
// This time, if (likely(skb_dst(skb) && skb_dst(skb)->neighbour)) is true
neigh = *to_ipoib_neigh(skb_dst(skb)->neighbour)
// neigh is NULL so we allocate a struct ipoib_neigh:
neigh = neigh_alloc(skb_dst(skb)->neighbour, skb->dev)
path_lookup(skb, dev, neigh)
// Assume it is a unicast address
neigh_add_path(skb, dev, neigh)
spin_lock_irqsave(&priv->lock, flags)
// Find or allocate a path record entry
path = __path_find(dev, n->ha + 4)
if (!path)
path = path_rec_create(dev, n->ha + 4)
// path->ah will be NULL the first time through so:
path_rec_start(dev, path)
// Start the path record lookup process.
// When it is finished, path_rec_completion() will be called.
spin_unlock_irqrestore(&priv->lock, flags)
// Eventually, path_rec_completion() will be called when the SA replies.
path_rec_completion()
// Create address handle from the reply
ah = ipoib_create_ah()
spin_lock_irqsave(&priv->lock, flags)
// for each ipoib_neigh in path->neigh_list
// set neigh->ah
// create connected mode QP if enabled
ipoib_cm_set(neigh, ipoib_cm_create_tx())
// Allocate a struct ipoib_cm_tx.
tx = kzalloc()
// Add it to the list of structs waiting to be processed.
list_add(&tx->list, &priv->cm.start_list)
// At this point, tx->neigh != NULL so ipoib_cm_destroy_tx(tx)
// will have an effect.
// Schedule ipoib_cm_tx_start() to be called by work thread
queue_work(ipoib_workqueue, &priv->cm.start_task)
// Mark path as "done" doing the SA path record query process
path->query = NULL
complete(&path->done)
spin_unlock_irqrestore(&priv->lock, flags)
// Requeue waiting skbs if any
while ((skb = __skb_dequeue(&skqueue)))
dev_queue_xmit(skb)
// Some time later, ipoib_cm_tx_start() is called from the workqueue thread
ipoib_cm_tx_start()
// netif_tx_lock_bh() is needed because we may set neigh->cm = NULL
netif_tx_lock_bh(dev)
spin_lock_irqsave(&priv->lock, flags)
while (!list_empty(&priv->cm.start_list))
// Remove tx from priv->cm.start_list
list_del_init(&p->list)
// Save the QPN and path record (destination address)
spin_unlock_irqrestore(&priv->lock, flags)
netif_tx_unlock_bh(dev)
ipoib_cm_tx_init()
// Create and initialize QP
p->qp = ipoib_cm_create_tx_qp()
// Create CM state
ib_create_cm_id(ipoib_cm_tx_handler)
// Call CM to start the process of establishing the RC QP connection
ipoib_cm_send_req()
netif_tx_lock_bh(dev)
spin_lock_irqsave(&priv->lock, flags)
// We need the lock to check p->neigh in case ipoib_cm_destroy_tx()
// was called but can't call ipoib_cm_tx_destroy() w/ the lock held so
// put it on the priv->reap_list if there was an error.
spin_unlock_irqrestore(&priv->lock, flags)
netif_tx_unlock_bh(dev)
// Eventually, CM calls ipoib_cm_tx_handler to say what parameters to
// use to modify the QP to RTS
ipoib_cm_tx_handler()
ipoib_cm_rep_handler()
// Use CM parameters to complete RC QP initialization
ib_modify_qp()
// This allows ipoib_start_xmit() to call ipoib_cm_send()
set_bit(IPOIB_FLAG_OPER_UP, &p->flags)
// XXX Hmm, race between ipoib_start_xmit() setting UP flag and
// then getting the priv lock to queue the skb if UP wasn't set.
// Requeued skb will call:
ipoib_start_xmit(skb, dev)
// This time all the checks pass so we call:
ipoib_cm_send()
post_send()
ib_post_send()
// Some time later, we get a send completion
ipoib_cm_handle_tx_wc()
// Note that the completion contains a pointer to the QP
// which contains a pointer to the struct ipoib_cm_tx.
// This means that when destroying a RC QP, we need to wait for completions
// to drain before calling kfree(tx). See ipoib_cm_tx_destroy().
struct ipoib_cm_tx *tx = wc->qp->qp_context;
ib_dma_unmap_single()
dev_kfree_skb_any()
// We lock the network device transmit since ipoib_cm_destroy_tx()
// might be called and change neigh->cm = NULL.
// Also, it protects tx->tx_tail, tx->tx_head, and ipoib_cm_send().
netif_tx_lock(dev)
if wc error
spin_lock_irqsave(&priv->lock, flags)
// This prevents ipoib_cm_send(tx) from being called.
// XXX not really needed since we hold the netif_tx_lock() and
// will clear neigh->cm.
clear_bit(IPOIB_FLAG_OPER_UP, &tx->flags)
ipoib_cm_destroy_tx(tx)
// Since there can be many error completions queued,
// tx->neigh != NULL means this is the first time this tx is being
// destroyed
struct ipoib_neigh *neigh = tx->neigh
if (!neigh) return;
neigh->cm = NULL;
tx->neigh = NULL;
// if ipoib_cm_tx_start() is not actively using this tx,
// put on tx reap list
spin_unlock_irqrestore(&priv->lock, flags)
netif_tx_unlock(dev)
// This is called by the workqueue to reap struct ipoib_cm_tx
ipoib_cm_tx_reap()
// This protects struct ipoib_cm_tx
netif_tx_lock_bh(dev);
spin_lock_irqsave(&priv->lock, flags);
while (!list_empty(&priv->cm.reap_list))
// remove from list, unlock
ipoib_cm_tx_destroy(p)
if (p->id) ib_destroy_cm_id(p->id);
// wait for all competions
if (p->qp) ib_destroy_qp(p->qp);
vfree(p->tx_ring);
kfree(p);
// lock
spin_unlock_irqrestore(&priv->lock, flags);
netif_tx_unlock_bh(dev);
// ipoib_flush_paths() is called when bringing the network interface down,
// or when changing between connected<->datagram modes.
ipoib_flush_paths()
// This protects priv->path_tree from ipoib_start_xmit()
netif_tx_lock_bh(dev);
spin_lock_irqsave(&priv->lock, flags);
// remove path from lookup table, add to remove_list
list_for_each_entry_safe(path, tp, &remove_list, list)
// Cancel any outstanding SA path query for this path
// unlock and wait for SA query completion
wait_for_completion(&path->done)
path_free(path)
foreach neigh in path->neigh_list
ipoib_neigh_flush(neigh)
// lock to protect against ipoib_start_xmit()
// return struct ipoib_neigh back to initial state
// unlock
kfree(path)
// lock
spin_unlock_irqrestore(&priv->lock, flags);
netif_tx_unlock_bh(dev);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread[parent not found: <20100817203619.22174.62871.stgit-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>]
* [PATCH v5] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path [not found] ` <20100817203619.22174.62871.stgit-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org> @ 2010-08-17 20:36 ` Ralph Campbell [not found] ` <20100817203624.22174.69480.stgit-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org> 2010-08-17 20:48 ` [PATCH v5 0/1] " Ralph Campbell 1 sibling, 1 reply; 7+ messages in thread From: Ralph Campbell @ 2010-08-17 20:36 UTC (permalink / raw) To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA Basically, a struct sk_buff has a pointer to a struct dst_entry which has a pointer to a struct neighbour. IPoIB uses struct neighbour.ha to store the IB "hardware address" and a pointer to a struct ipoib_neigh. When using connected mode, struct ipoib_neigh has a pointer to struct ipoib_cm_tx which contains pointers back to struct ipoib_neigh and ipoib_path. The core network code will call ipoib_neigh_cleanup() when it is destroying struct neighbour and IPoIB should guarantee that the struct ipoib_neigh and all the memory it points to are destroyed. Also, the connected mode RC QP contains a pointer back to the struct ipoib_cm_tx which can be dereferenced in the CQ completion handler. The problem is that there are several places where the struct ipoib_cm_tx can be used after it has been freed. The easiest way to reproduce this bug is to run a UDP bandwidth test while loading/unloading the IPoIB module or ifup/ifdown the interface. The fix is rather complex since the RC QP connection setup, tear down, and CQ completion draining are asynchronous processes. The struct ipoib_cm_tx goes through four "states": 1) Newly created by ipoib_cm_create_tx() neigh!=NULL, flags==0, and linked on priv->cm.start_list. 2) Being used by ipoib_cm_tx_start() neigh!=NULL, not on priv->cm.start_list, flags==0, and in the process of starting CM. 3) Being used by CM or sending data on the RC QP neigh!=NULL, not on priv->cm.start_list, flags & IPOIB_FLAG_INITIALIZED. 4) Being destroyed tx->neigh==NULL and on priv->cm.reap_task list or being destroyed by ipoib_cm_tx_reap(). Signed-off-by: Ralph Campbell <ralph.campbell-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org> --- drivers/infiniband/ulp/ipoib/ipoib.h | 14 + drivers/infiniband/ulp/ipoib/ipoib_cm.c | 108 ++++++---- drivers/infiniband/ulp/ipoib/ipoib_main.c | 266 ++++++++++++------------ drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 76 ++----- 4 files changed, 223 insertions(+), 241 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 753a983..5a842d7 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -415,9 +415,7 @@ static inline struct ipoib_neigh **to_ipoib_neigh(struct neighbour *neigh) INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, - struct net_device *dev); -void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); +void ipoib_neigh_flush(struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; @@ -464,7 +462,8 @@ void ipoib_dev_cleanup(struct net_device *dev); void ipoib_mcast_join_task(struct work_struct *work); void ipoib_mcast_carrier_on_task(struct work_struct *work); -void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb); +void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb, + struct ipoib_neigh *neigh); void ipoib_mcast_restart_task(struct work_struct *work); int ipoib_mcast_start_thread(struct net_device *dev); @@ -570,6 +569,7 @@ void ipoib_cm_dev_cleanup(struct net_device *dev); struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path, struct ipoib_neigh *neigh); void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx); +void ipoib_cm_flush_path(struct net_device *dev, struct ipoib_path *path); void ipoib_cm_skb_too_long(struct net_device *dev, struct sk_buff *skb, unsigned int mtu); void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc); @@ -659,6 +659,12 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx) } static inline +void ipoib_cm_flush_path(struct net_device *dev, struct ipoib_path *path) +{ + return; +} + +static inline int ipoib_cm_add_mode_attr(struct net_device *dev) { return 0; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index bb10041..c1f3a65 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -799,31 +799,14 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) if (wc->status != IB_WC_SUCCESS && wc->status != IB_WC_WR_FLUSH_ERR) { - struct ipoib_neigh *neigh; - ipoib_dbg(priv, "failed cm send event " "(status=%d, wrid=%d vend_err %x)\n", wc->status, wr_id, wc->vendor_err); spin_lock_irqsave(&priv->lock, flags); - neigh = tx->neigh; - - if (neigh) { - neigh->cm = NULL; - list_del(&neigh->list); - if (neigh->ah) - ipoib_put_ah(neigh->ah); - ipoib_neigh_free(dev, neigh); - - tx->neigh = NULL; - } - - if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { - list_move(&tx->list, &priv->cm.reap_list); - queue_work(ipoib_workqueue, &priv->cm.reap_task); - } clear_bit(IPOIB_FLAG_OPER_UP, &tx->flags); + ipoib_cm_destroy_tx(tx); spin_unlock_irqrestore(&priv->lock, flags); } @@ -1204,7 +1187,6 @@ static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, struct ipoib_cm_tx *tx = cm_id->context; struct ipoib_dev_priv *priv = netdev_priv(tx->dev); struct net_device *dev = priv->dev; - struct ipoib_neigh *neigh; unsigned long flags; int ret; @@ -1226,22 +1208,8 @@ static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id, ipoib_dbg(priv, "CM error %d.\n", event->event); netif_tx_lock_bh(dev); spin_lock_irqsave(&priv->lock, flags); - neigh = tx->neigh; - if (neigh) { - neigh->cm = NULL; - list_del(&neigh->list); - if (neigh->ah) - ipoib_put_ah(neigh->ah); - ipoib_neigh_free(dev, neigh); - - tx->neigh = NULL; - } - - if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { - list_move(&tx->list, &priv->cm.reap_list); - queue_work(ipoib_workqueue, &priv->cm.reap_task); - } + ipoib_cm_destroy_tx(tx); spin_unlock_irqrestore(&priv->lock, flags); netif_tx_unlock_bh(dev); @@ -1268,20 +1236,61 @@ struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path tx->path = path; tx->dev = dev; list_add(&tx->list, &priv->cm.start_list); - set_bit(IPOIB_FLAG_INITIALIZED, &tx->flags); queue_work(ipoib_workqueue, &priv->cm.start_task); return tx; } +/* + * Note: this is called with netif_tx_lock_bh() and priv->lock held. + */ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx) { struct ipoib_dev_priv *priv = netdev_priv(tx->dev); - if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { + struct ipoib_neigh *neigh = tx->neigh; + + if (!neigh) + return; + + ipoib_dbg(priv, "Reap connection for gid %pI6\n", neigh->dgid.raw); + neigh->cm = NULL; + tx->neigh = NULL; + + /* Force ipoib_start_xmit() to start a new connection if called */ + if (neigh->ah) { + ipoib_put_ah(neigh->ah); + neigh->ah = NULL; + memset(&neigh->dgid.raw, 0, sizeof(union ib_gid)); + } + + /* + * If ipoib_cm_tx_start() is actively using this tx, + * don't delete it by putting it on the reap_list. + * Instead, ipoib_cm_tx_start() will handle the destruction. + */ + if (!list_empty(&tx->list) || + test_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { list_move(&tx->list, &priv->cm.reap_list); queue_work(ipoib_workqueue, &priv->cm.reap_task); - ipoib_dbg(priv, "Reap connection for gid %pI6\n", - tx->neigh->dgid.raw); - tx->neigh = NULL; + } +} + +/* + * Search the list of connections to be started and remove any entries + * which match the path being destroyed. + * + * This should be called with netif_tx_lock_bh() and priv->lock held. + */ +void ipoib_cm_flush_path(struct net_device *dev, struct ipoib_path *path) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ipoib_cm_tx *tx, *nx; + + list_for_each_entry_safe(tx, nx, &priv->cm.start_list, list) { + tx = list_entry(priv->cm.start_list.next, typeof(*tx), list); + if (tx->path == path) { + list_del(&tx->list); + kfree(tx); + } } } @@ -1307,6 +1316,7 @@ static void ipoib_cm_tx_start(struct work_struct *work) neigh = p->neigh; qpn = IPOIB_QPN(neigh->neighbour->ha); memcpy(&pathrec, &p->path->pathrec, sizeof pathrec); + p->path = NULL; spin_unlock_irqrestore(&priv->lock, flags); netif_tx_unlock_bh(dev); @@ -1316,18 +1326,24 @@ static void ipoib_cm_tx_start(struct work_struct *work) netif_tx_lock_bh(dev); spin_lock_irqsave(&priv->lock, flags); - if (ret) { - neigh = p->neigh; + /* + * If ipoib_cm_destroy_tx() was called or there was an error, + * we need to destroy tx. + */ + neigh = p->neigh; + if (!neigh || ret) { if (neigh) { neigh->cm = NULL; - list_del(&neigh->list); - if (neigh->ah) + if (neigh->ah) { ipoib_put_ah(neigh->ah); - ipoib_neigh_free(dev, neigh); + neigh->ah = NULL; + memset(&neigh->dgid.raw, 0, + sizeof(union ib_gid)); + } } - list_del(&p->list); kfree(p); - } + } else + set_bit(IPOIB_FLAG_INITIALIZED, &p->flags); } spin_unlock_irqrestore(&priv->lock, flags); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index b4b2257..c867519 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -91,6 +91,9 @@ struct workqueue_struct *ipoib_workqueue; struct ib_sa_client ipoib_sa_client; +static struct ipoib_neigh *neigh_alloc(struct neighbour *neighbour, + struct net_device *dev); + static void ipoib_add_one(struct ib_device *device); static void ipoib_remove_one(struct ib_device *device); @@ -261,32 +264,16 @@ static int __path_add(struct net_device *dev, struct ipoib_path *path) return 0; } -static void path_free(struct net_device *dev, struct ipoib_path *path) +static void path_free(struct ipoib_path *path) { - struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_neigh *neigh, *tn; struct sk_buff *skb; - unsigned long flags; while ((skb = __skb_dequeue(&path->queue))) dev_kfree_skb_irq(skb); - spin_lock_irqsave(&priv->lock, flags); - - list_for_each_entry_safe(neigh, tn, &path->neigh_list, list) { - /* - * It's safe to call ipoib_put_ah() inside priv->lock - * here, because we know that path->ah will always - * hold one more reference, so ipoib_put_ah() will - * never do more than decrement the ref count. - */ - if (neigh->ah) - ipoib_put_ah(neigh->ah); - - ipoib_neigh_free(dev, neigh); - } - - spin_unlock_irqrestore(&priv->lock, flags); + list_for_each_entry_safe(neigh, tn, &path->neigh_list, list) + ipoib_neigh_flush(neigh); if (path->ah) ipoib_put_ah(path->ah); @@ -387,10 +374,11 @@ void ipoib_flush_paths(struct net_device *dev) list_for_each_entry_safe(path, tp, &remove_list, list) { if (path->query) ib_sa_cancel_query(path->query_id, path->query); + ipoib_cm_flush_path(dev, path); spin_unlock_irqrestore(&priv->lock, flags); netif_tx_unlock_bh(dev); wait_for_completion(&path->done); - path_free(dev, path); + path_free(path); netif_tx_lock_bh(dev); spin_lock_irqsave(&priv->lock, flags); } @@ -460,19 +448,15 @@ static void path_rec_completion(int status, memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw, sizeof(union ib_gid)); - if (ipoib_cm_enabled(dev, neigh->neighbour)) { - if (!ipoib_cm_get(neigh)) - ipoib_cm_set(neigh, ipoib_cm_create_tx(dev, - path, - neigh)); - if (!ipoib_cm_get(neigh)) { - list_del(&neigh->list); - if (neigh->ah) - ipoib_put_ah(neigh->ah); - ipoib_neigh_free(dev, neigh); - continue; - } - } + /* + * If connected mode is enabled but not started, + * start getting a connection. + */ + if (ipoib_cm_enabled(dev, neigh->neighbour) && + !ipoib_cm_get(neigh)) + ipoib_cm_set(neigh, ipoib_cm_create_tx(dev, + path, + neigh)); while ((skb = __skb_dequeue(&neigh->queue))) __skb_queue_tail(&skqueue, skb); @@ -520,6 +504,8 @@ static struct ipoib_path *path_rec_create(struct net_device *dev, void *gid) path->pathrec.numb_path = 1; path->pathrec.traffic_class = priv->broadcast->mcmember.traffic_class; + __path_add(dev, path); + return path; } @@ -554,32 +540,27 @@ static int path_rec_start(struct net_device *dev, return 0; } -static void neigh_add_path(struct sk_buff *skb, struct net_device *dev) +static void neigh_add_path(struct sk_buff *skb, struct net_device *dev, + struct ipoib_neigh *neigh) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_path *path; - struct ipoib_neigh *neigh; + struct neighbour *n; unsigned long flags; - neigh = ipoib_neigh_alloc(skb_dst(skb)->neighbour, skb->dev); - if (!neigh) { - ++dev->stats.tx_dropped; - dev_kfree_skb_any(skb); - return; - } + n = skb_dst(skb)->neighbour; spin_lock_irqsave(&priv->lock, flags); - path = __path_find(dev, skb_dst(skb)->neighbour->ha + 4); + path = __path_find(dev, n->ha + 4); if (!path) { - path = path_rec_create(dev, skb_dst(skb)->neighbour->ha + 4); + path = path_rec_create(dev, n->ha + 4); if (!path) - goto err_path; - - __path_add(dev, path); + goto err_drop; } - list_add_tail(&neigh->list, &path->neigh_list); + if (list_empty(&neigh->list)) + list_add_tail(&neigh->list, &path->neigh_list); if (path->ah) { kref_get(&path->ah->ref); @@ -587,16 +568,11 @@ static void neigh_add_path(struct sk_buff *skb, struct net_device *dev) memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw, sizeof(union ib_gid)); - if (ipoib_cm_enabled(dev, neigh->neighbour)) { + if (ipoib_cm_enabled(dev, n)) { if (!ipoib_cm_get(neigh)) ipoib_cm_set(neigh, ipoib_cm_create_tx(dev, path, neigh)); - if (!ipoib_cm_get(neigh)) { - list_del(&neigh->list); - if (neigh->ah) - ipoib_put_ah(neigh->ah); - ipoib_neigh_free(dev, neigh); + if (!ipoib_cm_get(neigh)) goto err_drop; - } if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) __skb_queue_tail(&neigh->queue, skb); else { @@ -606,12 +582,10 @@ static void neigh_add_path(struct sk_buff *skb, struct net_device *dev) } } else { spin_unlock_irqrestore(&priv->lock, flags); - ipoib_send(dev, skb, path->ah, IPOIB_QPN(skb_dst(skb)->neighbour->ha)); + ipoib_send(dev, skb, path->ah, IPOIB_QPN(n->ha)); return; } } else { - neigh->ah = NULL; - if (!path->query && path_rec_start(dev, path)) goto err_list; @@ -622,10 +596,7 @@ static void neigh_add_path(struct sk_buff *skb, struct net_device *dev) return; err_list: - list_del(&neigh->list); - -err_path: - ipoib_neigh_free(dev, neigh); + list_del_init(&neigh->list); err_drop: ++dev->stats.tx_dropped; dev_kfree_skb_any(skb); @@ -633,20 +604,22 @@ err_drop: spin_unlock_irqrestore(&priv->lock, flags); } -static void ipoib_path_lookup(struct sk_buff *skb, struct net_device *dev) +static void path_lookup(struct sk_buff *skb, struct net_device *dev, + struct ipoib_neigh *neigh) { - struct ipoib_dev_priv *priv = netdev_priv(skb->dev); + struct ipoib_dev_priv *priv; /* Look up path record for unicasts */ if (skb_dst(skb)->neighbour->ha[4] != 0xff) { - neigh_add_path(skb, dev); + neigh_add_path(skb, dev, neigh); return; } /* Add in the P_Key for multicasts */ + priv = netdev_priv(dev); skb_dst(skb)->neighbour->ha[8] = (priv->pkey >> 8) & 0xff; skb_dst(skb)->neighbour->ha[9] = priv->pkey & 0xff; - ipoib_mcast_send(dev, skb_dst(skb)->neighbour->ha + 4, skb); + ipoib_mcast_send(dev, skb_dst(skb)->neighbour->ha + 4, skb, neigh); } static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, @@ -659,35 +632,13 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, spin_lock_irqsave(&priv->lock, flags); path = __path_find(dev, phdr->hwaddr + 4); - if (!path || !path->valid) { - int new_path = 0; - - if (!path) { - path = path_rec_create(dev, phdr->hwaddr + 4); - new_path = 1; - } - if (path) { - /* put pseudoheader back on for next time */ - skb_push(skb, sizeof *phdr); - __skb_queue_tail(&path->queue, skb); - - if (!path->query && path_rec_start(dev, path)) { - spin_unlock_irqrestore(&priv->lock, flags); - if (new_path) - path_free(dev, path); - return; - } else - __path_add(dev, path); - } else { - ++dev->stats.tx_dropped; - dev_kfree_skb_any(skb); - } - - spin_unlock_irqrestore(&priv->lock, flags); - return; + if (!path) { + path = path_rec_create(dev, phdr->hwaddr + 4); + if (!path) + goto drop; } - if (path->ah) { + if (path->valid && path->ah) { ipoib_dbg(priv, "Send unicast ARP to %04x\n", be16_to_cpu(path->pathrec.dlid)); @@ -699,12 +650,16 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, /* put pseudoheader back on for next time */ skb_push(skb, sizeof *phdr); __skb_queue_tail(&path->queue, skb); - } else { - ++dev->stats.tx_dropped; - dev_kfree_skb_any(skb); - } + } else + goto drop; spin_unlock_irqrestore(&priv->lock, flags); + return; + +drop: + ++dev->stats.tx_dropped; + dev_kfree_skb_any(skb); + spin_unlock_irqrestore(&priv->lock, flags); } static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) @@ -714,31 +669,23 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) unsigned long flags; if (likely(skb_dst(skb) && skb_dst(skb)->neighbour)) { - if (unlikely(!*to_ipoib_neigh(skb_dst(skb)->neighbour))) { - ipoib_path_lookup(skb, dev); + neigh = *to_ipoib_neigh(skb_dst(skb)->neighbour); + + if (unlikely(!neigh)) { + neigh = neigh_alloc(skb_dst(skb)->neighbour, skb->dev); + if (!neigh) { + ++dev->stats.tx_dropped; + dev_kfree_skb_any(skb); + } else + path_lookup(skb, dev, neigh); return NETDEV_TX_OK; } - neigh = *to_ipoib_neigh(skb_dst(skb)->neighbour); - if (unlikely((memcmp(&neigh->dgid.raw, skb_dst(skb)->neighbour->ha + 4, sizeof(union ib_gid))) || (neigh->dev != dev))) { - spin_lock_irqsave(&priv->lock, flags); - /* - * It's safe to call ipoib_put_ah() inside - * priv->lock here, because we know that - * path->ah will always hold one more reference, - * so ipoib_put_ah() will never do more than - * decrement the ref count. - */ - if (neigh->ah) - ipoib_put_ah(neigh->ah); - list_del(&neigh->list); - ipoib_neigh_free(dev, neigh); - spin_unlock_irqrestore(&priv->lock, flags); - ipoib_path_lookup(skb, dev); + path_lookup(skb, dev, neigh); return NETDEV_TX_OK; } @@ -770,7 +717,7 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) phdr->hwaddr[8] = (priv->pkey >> 8) & 0xff; phdr->hwaddr[9] = priv->pkey & 0xff; - ipoib_mcast_send(dev, phdr->hwaddr + 4, skb); + ipoib_mcast_send(dev, phdr->hwaddr + 4, skb, NULL); } else { /* unicast GID -- should be ARP or RARP reply */ @@ -847,34 +794,48 @@ static void ipoib_neigh_cleanup(struct neighbour *n) { struct ipoib_neigh *neigh; struct ipoib_dev_priv *priv = netdev_priv(n->dev); + struct sk_buff *skb; unsigned long flags; - struct ipoib_ah *ah = NULL; + + netif_tx_lock_bh(n->dev); + spin_lock_irqsave(&priv->lock, flags); neigh = *to_ipoib_neigh(n); - if (neigh) - priv = netdev_priv(neigh->dev); - else + if (!neigh) { + spin_unlock_irqrestore(&priv->lock, flags); + netif_tx_unlock_bh(n->dev); return; + } + *to_ipoib_neigh(n) = NULL; + neigh->neighbour = NULL; + ipoib_dbg(priv, "neigh_cleanup for %06x %pI6\n", IPOIB_QPN(n->ha), n->ha + 4); - spin_lock_irqsave(&priv->lock, flags); + if (ipoib_cm_get(neigh)) + ipoib_cm_destroy_tx(ipoib_cm_get(neigh)); - if (neigh->ah) - ah = neigh->ah; - list_del(&neigh->list); - ipoib_neigh_free(n->dev, neigh); + if (!list_empty(&neigh->list)) + list_del_init(&neigh->list); spin_unlock_irqrestore(&priv->lock, flags); + netif_tx_unlock_bh(n->dev); - if (ah) - ipoib_put_ah(ah); + while ((skb = __skb_dequeue(&neigh->queue))) { + ++n->dev->stats.tx_dropped; + dev_kfree_skb_any(skb); + } + + if (neigh->ah) + ipoib_put_ah(neigh->ah); + + kfree(neigh); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, - struct net_device *dev) +static struct ipoib_neigh *neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) { struct ipoib_neigh *neigh; @@ -882,27 +843,58 @@ struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, if (!neigh) return NULL; + neigh->ah = NULL; neigh->neighbour = neighbour; neigh->dev = dev; memset(&neigh->dgid.raw, 0, sizeof (union ib_gid)); *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(&neigh->queue); + INIT_LIST_HEAD(&neigh->list); ipoib_cm_set(neigh, NULL); return neigh; } -void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh) +/* + * This is called when flushing the path or multicast GID from the + * struct ipoib_neigh. ipoib_start_xmit() will then try to reinitialize + * the address the next time it is called. + * Note that the "neigh" pointer passed should not be used after calling this. + */ +void ipoib_neigh_flush(struct ipoib_neigh *neigh) { + struct net_device *dev = neigh->dev; + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct sk_buff_head skqueue; struct sk_buff *skb; - *to_ipoib_neigh(neigh->neighbour) = NULL; - while ((skb = __skb_dequeue(&neigh->queue))) { - ++dev->stats.tx_dropped; - dev_kfree_skb_any(skb); - } + struct ipoib_ah *ah; + unsigned long flags; + + skb_queue_head_init(&skqueue); + + netif_tx_lock_bh(dev); + spin_lock_irqsave(&priv->lock, flags); + + while ((skb = __skb_dequeue(&neigh->queue))) + __skb_queue_tail(&skqueue, skb); + + ah = neigh->ah; + neigh->ah = NULL; + memset(&neigh->dgid.raw, 0, sizeof(union ib_gid)); + list_del_init(&neigh->list); if (ipoib_cm_get(neigh)) ipoib_cm_destroy_tx(ipoib_cm_get(neigh)); - kfree(neigh); + + spin_unlock_irqrestore(&priv->lock, flags); + netif_tx_unlock_bh(dev); + + while ((skb = __skb_dequeue(&skqueue))) { + ++dev->stats.tx_dropped; + dev_kfree_skb_irq(skb); + } + + if (ah) + ipoib_put_ah(ah); } static int ipoib_neigh_setup_dev(struct net_device *dev, struct neigh_parms *parms) @@ -1163,7 +1155,7 @@ static ssize_t create_child(struct device *dev, return ret ? ret : count; } -static DEVICE_ATTR(create_child, S_IWUSR, NULL, create_child); +static DEVICE_ATTR(create_child, S_IWUGO, NULL, create_child); static ssize_t delete_child(struct device *dev, struct device_attribute *attr, @@ -1183,7 +1175,7 @@ static ssize_t delete_child(struct device *dev, return ret ? ret : count; } -static DEVICE_ATTR(delete_child, S_IWUSR, NULL, delete_child); +static DEVICE_ATTR(delete_child, S_IWUGO, NULL, delete_child); int ipoib_add_pkey_attr(struct net_device *dev) { diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index 3871ac6..9ab3b4e 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -68,28 +68,14 @@ struct ipoib_mcast_iter { static void ipoib_mcast_free(struct ipoib_mcast *mcast) { struct net_device *dev = mcast->dev; - struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_neigh *neigh, *tmp; int tx_dropped = 0; ipoib_dbg_mcast(netdev_priv(dev), "deleting multicast group %pI6\n", mcast->mcmember.mgid.raw); - spin_lock_irq(&priv->lock); - - list_for_each_entry_safe(neigh, tmp, &mcast->neigh_list, list) { - /* - * It's safe to call ipoib_put_ah() inside priv->lock - * here, because we know that mcast->ah will always - * hold one more reference, so ipoib_put_ah() will - * never do more than decrement the ref count. - */ - if (neigh->ah) - ipoib_put_ah(neigh->ah); - ipoib_neigh_free(dev, neigh); - } - - spin_unlock_irq(&priv->lock); + list_for_each_entry_safe(neigh, tmp, &mcast->neigh_list, list) + ipoib_neigh_flush(neigh); if (mcast->ah) ipoib_put_ah(mcast->ah); @@ -655,7 +641,8 @@ static int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast) return 0; } -void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb) +void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb, + struct ipoib_neigh *neigh) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_mcast *mcast; @@ -665,11 +652,8 @@ void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb) if (!test_bit(IPOIB_FLAG_OPER_UP, &priv->flags) || !priv->broadcast || - !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { - ++dev->stats.tx_dropped; - dev_kfree_skb_any(skb); - goto unlock; - } + !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) + goto drop; mcast = __ipoib_mcast_find(dev, mgid); if (!mcast) { @@ -681,9 +665,7 @@ void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb) if (!mcast) { ipoib_warn(priv, "unable to allocate memory for " "multicast structure\n"); - ++dev->stats.tx_dropped; - dev_kfree_skb_any(skb); - goto out; + goto drop; } set_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags); @@ -693,39 +675,20 @@ void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb) } if (!mcast->ah) { - if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE) - skb_queue_tail(&mcast->pkt_queue, skb); - else { - ++dev->stats.tx_dropped; - dev_kfree_skb_any(skb); - } - + if (skb_queue_len(&mcast->pkt_queue) >= IPOIB_MAX_MCAST_QUEUE) + goto drop; + skb_queue_tail(&mcast->pkt_queue, skb); if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) ipoib_dbg_mcast(priv, "no address vector, " "but multicast join already started\n"); else if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) ipoib_mcast_sendonly_join(mcast); - - /* - * If lookup completes between here and out:, don't - * want to send packet twice. - */ - mcast = NULL; - } - -out: - if (mcast && mcast->ah) { - if (skb_dst(skb) && - skb_dst(skb)->neighbour && - !*to_ipoib_neigh(skb_dst(skb)->neighbour)) { - struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb_dst(skb)->neighbour, - skb->dev); - - if (neigh) { - kref_get(&mcast->ah->ref); - neigh->ah = mcast->ah; - list_add_tail(&neigh->list, &mcast->neigh_list); - } + } else { + if (neigh && list_empty(&neigh->list)) { + kref_get(&mcast->ah->ref); + neigh->ah = mcast->ah; + memcpy(neigh->dgid.raw, mgid, sizeof(union ib_gid)); + list_add_tail(&neigh->list, &mcast->neigh_list); } spin_unlock_irqrestore(&priv->lock, flags); @@ -733,8 +696,13 @@ out: return; } -unlock: spin_unlock_irqrestore(&priv->lock, flags); + return; + +drop: + spin_unlock_irqrestore(&priv->lock, flags); + ++dev->stats.tx_dropped; + dev_kfree_skb_any(skb); } void ipoib_mcast_dev_flush(struct net_device *dev) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 7+ messages in thread
[parent not found: <20100817203624.22174.69480.stgit-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>]
* Re: [PATCH v5] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path [not found] ` <20100817203624.22174.69480.stgit-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org> @ 2010-09-03 3:41 ` Pradeep Satyanarayana [not found] ` <4C806E72.6030507-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Pradeep Satyanarayana @ 2010-09-03 3:41 UTC (permalink / raw) To: Ralph Campbell; +Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA Ralph, I see the following crash sporadically (only under stress) with a Sles11SP1 (which is 2.6.32 kernel). I saw this crash with V4 of your patch and have not yet had a chance to try V5. Have you seen this in your testing? If this not the crash stack can you please share what your patch fixes? <4>ib0: RX drain timing out <4>idr_remove called for id=11491974 which is not allocated. <4>Call Trace: <4>[c000000749fe33b0] [c0000000000129e4] .show_stack+0x6c/0x198 (unreliable) <4>[c000000749fe3460] [c0000000002ea594] .sub_remove+0x1ec/0x1f8 <4>[c000000749fe3520] [c0000000002ea5e0] .idr_remove+0x40/0xf8 <4>[c000000749fe35b0] [d000000012d84d70] .cm_destroy_id+0xa0/0x520 [ib_cm] <4>[c000000749fe3680] [d00000001b7fb644] .ipoib_cm_free_rx_reap_list+0xd4/0x190 [ib_ipoib] <4>[c000000749fe3740] [d00000001b7fe404] .ipoib_cm_dev_stop+0x23c/0x360 [ib_ipoib] <4>[c000000749fe3800] [d00000001b7f4dbc] .ipoib_ib_dev_stop+0xe4/0x4b0 [ib_ipoib] <4>[c000000749fe3960] [d00000001b7f0f30] .ipoib_stop+0x88/0x178 [ib_ipoib] <4>[c000000749fe39f0] [c0000000004eacf4] .dev_close+0xdc/0x148 <4>[c000000749fe3a80] [c0000000004ea2b8] .dev_change_flags+0x1f0/0x288 <4>[c000000749fe3b20] [d00000001b7f11b8] .ipoib_remove_one+0xb8/0x140 [ib_ipoib] <4>[c000000749fe3bc0] [d00000001210425c] .ib_unregister_client+0xb4/0x1b8 [ib_core] <4>[c000000749fe3c90] [d00000001b7ffde8] .ipoib_cleanup_module+0x20/0x60 [ib_ipoib] <4>[c000000749fe3d20] [c0000000000ec408] .SyS_delete_module+0x238/0x320 <4>[c000000749fe3e30] [c0000000000085b4] syscall_exit+0x0/0x40 <1>Unable to handle kernel paging request for data at address 0x45000027228d1ffb <1>Faulting instruction address: 0xc0000000005a8e88 12:mon> e cpu 0x12: Vector: 300 (Data Access) at [c000000749fe3250] pc: c0000000005a8e88: .wait_for_common+0xb8/0x268 lr: c0000000005a8e20: .wait_for_common+0x50/0x268 sp: c000000749fe34d0 msr: 8000000000009032 dar: 45000027228d1ffb dsisr: 42000000 current = 0xc00000074b4ce0e0 paca = 0xc000000000f64a00 pid = 13605, comm = modprobe 12:mon> Thanks Pradeep -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <4C806E72.6030507-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* RE: [PATCH v5] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path [not found] ` <4C806E72.6030507-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2010-09-03 16:06 ` Ralph Campbell 2010-10-01 0:35 ` Ralph Campbell 1 sibling, 0 replies; 7+ messages in thread From: Ralph Campbell @ 2010-09-03 16:06 UTC (permalink / raw) To: Pradeep Satyanarayana Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org I haven't seen this stack trace before. Since it involves the RX QP connection and my patch changes the TX QP connection, I doubt my patch has any effect on this case. When I get some time, I will look to see if I can find similar races in the RX connection set up & tear down that might exist. ________________________________________ From: Pradeep Satyanarayana [pradeeps-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org] Sent: Thursday, September 02, 2010 8:41 PM To: Ralph Campbell Cc: Roland Dreier; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Subject: Re: [PATCH v5] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path Ralph, I see the following crash sporadically (only under stress) with a Sles11SP1 (which is 2.6.32 kernel). I saw this crash with V4 of your patch and have not yet had a chance to try V5. Have you seen this in your testing? If this not the crash stack can you please share what your patch fixes? <4>ib0: RX drain timing out <4>idr_remove called for id=11491974 which is not allocated. <4>Call Trace: <4>[c000000749fe33b0] [c0000000000129e4] .show_stack+0x6c/0x198 (unreliable) <4>[c000000749fe3460] [c0000000002ea594] .sub_remove+0x1ec/0x1f8 <4>[c000000749fe3520] [c0000000002ea5e0] .idr_remove+0x40/0xf8 <4>[c000000749fe35b0] [d000000012d84d70] .cm_destroy_id+0xa0/0x520 [ib_cm] <4>[c000000749fe3680] [d00000001b7fb644] .ipoib_cm_free_rx_reap_list+0xd4/0x190 [ib_ipoib] <4>[c000000749fe3740] [d00000001b7fe404] .ipoib_cm_dev_stop+0x23c/0x360 [ib_ipoib] <4>[c000000749fe3800] [d00000001b7f4dbc] .ipoib_ib_dev_stop+0xe4/0x4b0 [ib_ipoib] <4>[c000000749fe3960] [d00000001b7f0f30] .ipoib_stop+0x88/0x178 [ib_ipoib] <4>[c000000749fe39f0] [c0000000004eacf4] .dev_close+0xdc/0x148 <4>[c000000749fe3a80] [c0000000004ea2b8] .dev_change_flags+0x1f0/0x288 <4>[c000000749fe3b20] [d00000001b7f11b8] .ipoib_remove_one+0xb8/0x140 [ib_ipoib] <4>[c000000749fe3bc0] [d00000001210425c] .ib_unregister_client+0xb4/0x1b8 [ib_core] <4>[c000000749fe3c90] [d00000001b7ffde8] .ipoib_cleanup_module+0x20/0x60 [ib_ipoib] <4>[c000000749fe3d20] [c0000000000ec408] .SyS_delete_module+0x238/0x320 <4>[c000000749fe3e30] [c0000000000085b4] syscall_exit+0x0/0x40 <1>Unable to handle kernel paging request for data at address 0x45000027228d1ffb <1>Faulting instruction address: 0xc0000000005a8e88 12:mon> e cpu 0x12: Vector: 300 (Data Access) at [c000000749fe3250] pc: c0000000005a8e88: .wait_for_common+0xb8/0x268 lr: c0000000005a8e20: .wait_for_common+0x50/0x268 sp: c000000749fe34d0 msr: 8000000000009032 dar: 45000027228d1ffb dsisr: 42000000 current = 0xc00000074b4ce0e0 paca = 0xc000000000f64a00 pid = 13605, comm = modprobe 12:mon> Thanks Pradeep -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path [not found] ` <4C806E72.6030507-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2010-09-03 16:06 ` Ralph Campbell @ 2010-10-01 0:35 ` Ralph Campbell [not found] ` <1285893348.22791.120.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org> 1 sibling, 1 reply; 7+ messages in thread From: Ralph Campbell @ 2010-10-01 0:35 UTC (permalink / raw) To: Pradeep Satyanarayana Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [-- Attachment #1: Type: text/plain, Size: 2592 bytes --] I was looking at the Rx connection tear down and found a bug. I don't know if it would cause this panic but you might try it. I haven't stress tested it but it compiles and basic network connections work. I also don't like the call to cancel_delayed_work(&priv->cm.stale_task) at the end of ipoib_cm_dev_stop(). I think it should be called after ib_destroy_cm_id() and priv->cm.id = NULL. On Thu, 2010-09-02 at 20:41 -0700, Pradeep Satyanarayana wrote: > Ralph, > > I see the following crash sporadically (only under stress) with a Sles11SP1 (which is 2.6.32 kernel). > I saw this crash with V4 of your patch and have not yet had a chance to try V5. Have you seen this > in your testing? If this not the crash stack can you please share what your patch fixes? > > <4>ib0: RX drain timing out > <4>idr_remove called for id=11491974 which is not allocated. > <4>Call Trace: > <4>[c000000749fe33b0] [c0000000000129e4] .show_stack+0x6c/0x198 (unreliable) > <4>[c000000749fe3460] [c0000000002ea594] .sub_remove+0x1ec/0x1f8 > <4>[c000000749fe3520] [c0000000002ea5e0] .idr_remove+0x40/0xf8 > <4>[c000000749fe35b0] [d000000012d84d70] .cm_destroy_id+0xa0/0x520 [ib_cm] > <4>[c000000749fe3680] [d00000001b7fb644] .ipoib_cm_free_rx_reap_list+0xd4/0x190 [ib_ipoib] > <4>[c000000749fe3740] [d00000001b7fe404] .ipoib_cm_dev_stop+0x23c/0x360 [ib_ipoib] > <4>[c000000749fe3800] [d00000001b7f4dbc] .ipoib_ib_dev_stop+0xe4/0x4b0 [ib_ipoib] > <4>[c000000749fe3960] [d00000001b7f0f30] .ipoib_stop+0x88/0x178 [ib_ipoib] > <4>[c000000749fe39f0] [c0000000004eacf4] .dev_close+0xdc/0x148 > <4>[c000000749fe3a80] [c0000000004ea2b8] .dev_change_flags+0x1f0/0x288 > <4>[c000000749fe3b20] [d00000001b7f11b8] .ipoib_remove_one+0xb8/0x140 [ib_ipoib] > <4>[c000000749fe3bc0] [d00000001210425c] .ib_unregister_client+0xb4/0x1b8 [ib_core] > <4>[c000000749fe3c90] [d00000001b7ffde8] .ipoib_cleanup_module+0x20/0x60 [ib_ipoib] > <4>[c000000749fe3d20] [c0000000000ec408] .SyS_delete_module+0x238/0x320 > <4>[c000000749fe3e30] [c0000000000085b4] syscall_exit+0x0/0x40 > <1>Unable to handle kernel paging request for data at address 0x45000027228d1ffb > <1>Faulting instruction address: 0xc0000000005a8e88 > 12:mon> e > cpu 0x12: Vector: 300 (Data Access) at [c000000749fe3250] > pc: c0000000005a8e88: .wait_for_common+0xb8/0x268 > lr: c0000000005a8e20: .wait_for_common+0x50/0x268 > sp: c000000749fe34d0 > msr: 8000000000009032 > dar: 45000027228d1ffb > dsisr: 42000000 > current = 0xc00000074b4ce0e0 > paca = 0xc000000000f64a00 > pid = 13605, comm = modprobe > 12:mon> > > Thanks > Pradeep [-- Attachment #2: ipoib-rx-drain-race.patch --] [-- Type: text/x-patch, Size: 2129 bytes --] IB/ipoib: fix race when handling IPOIB_CM_RX_DRAIN_WRID From: Ralph Campbell <ralph.campbell-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org> ipoib_cm_start_rx_drain() calls ib_post_send() and *then* moves the struct ipoib_cm_rx onto the rx_drain_list. The ib_post_send() will trigger a completion callback to ipoib_cm_handle_rx_wc() which tries to move the rx_drain_list to the rx_reap_list but if the callback happens before ipoib_cm_start_rx_drain() has moved the structure, it is left in limbo. The fix is to change ipoib_cm_start_rx_drain() to put the struct on the rx_drain_list and then call ib_post_send(). Also, only move one struct from rx_flush_list to rx_drain_list since concurrent IPOIB_CM_RX_DRAIN_WRID events on different QPs could put multiple ipoib_cm_rx structs on rx_flush_list. Signed-off-by: Ralph Campbell <ralph.campbell-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org> --- drivers/infiniband/ulp/ipoib/ipoib_cm.c | 12 +++++++++--- 1 files changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index bb10041..dfff159 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -216,15 +216,21 @@ static void ipoib_cm_start_rx_drain(struct ipoib_dev_priv *priv) !list_empty(&priv->cm.rx_drain_list)) return; + p = list_entry(priv->cm.rx_flush_list.next, typeof(*p), list); + + /* + * Put p on rx_drain_list before calling ib_post_send() or there + * is a race with the ipoib_cm_handle_rx_wc() completion handler + * trying to remove it from rx_drain_list. + */ + list_move(&p->list, &priv->cm.rx_drain_list); + /* * QPs on flush list are error state. This way, a "flush * error" WC will be immediately generated for each WR we post. */ - p = list_entry(priv->cm.rx_flush_list.next, typeof(*p), list); if (ib_post_send(p->qp, &ipoib_cm_rx_drain_wr, &bad_wr)) ipoib_warn(priv, "failed to post drain wr\n"); - - list_splice_init(&priv->cm.rx_flush_list, &priv->cm.rx_drain_list); } static void ipoib_cm_rx_event_handler(struct ib_event *event, void *ctx) ^ permalink raw reply related [flat|nested] 7+ messages in thread
[parent not found: <1285893348.22791.120.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>]
* Re: [PATCH v5] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path [not found] ` <1285893348.22791.120.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org> @ 2010-10-01 22:38 ` Pradeep Satyanarayana 0 siblings, 0 replies; 7+ messages in thread From: Pradeep Satyanarayana @ 2010-10-01 22:38 UTC (permalink / raw) To: Ralph Campbell Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On 9/30/2010 5:35 PM, Ralph Campbell wrote: > I was looking at the Rx connection tear down and found a bug. > I don't know if it would cause this panic but you might try it. > I haven't stress tested it but it compiles and basic network > connections work. > > I also don't like the call to cancel_delayed_work(&priv->cm.stale_task) > at the end of ipoib_cm_dev_stop(). I think it should be called after > ib_destroy_cm_id() and priv->cm.id = NULL. > Ralph, I have managed to recreate this crash a few times under stress. I expect to be able to try your patch some time next week, and will let you know. Thanks for taking time to look into this. Thanks Pradeep > On Thu, 2010-09-02 at 20:41 -0700, Pradeep Satyanarayana wrote: >> Ralph, >> >> I see the following crash sporadically (only under stress) with a Sles11SP1 (which is 2.6.32 kernel). >> I saw this crash with V4 of your patch and have not yet had a chance to try V5. Have you seen this >> in your testing? If this not the crash stack can you please share what your patch fixes? >> >> <4>ib0: RX drain timing out >> <4>idr_remove called for id=11491974 which is not allocated. >> <4>Call Trace: >> <4>[c000000749fe33b0] [c0000000000129e4] .show_stack+0x6c/0x198 (unreliable) >> <4>[c000000749fe3460] [c0000000002ea594] .sub_remove+0x1ec/0x1f8 >> <4>[c000000749fe3520] [c0000000002ea5e0] .idr_remove+0x40/0xf8 >> <4>[c000000749fe35b0] [d000000012d84d70] .cm_destroy_id+0xa0/0x520 [ib_cm] >> <4>[c000000749fe3680] [d00000001b7fb644] .ipoib_cm_free_rx_reap_list+0xd4/0x190 [ib_ipoib] >> <4>[c000000749fe3740] [d00000001b7fe404] .ipoib_cm_dev_stop+0x23c/0x360 [ib_ipoib] >> <4>[c000000749fe3800] [d00000001b7f4dbc] .ipoib_ib_dev_stop+0xe4/0x4b0 [ib_ipoib] >> <4>[c000000749fe3960] [d00000001b7f0f30] .ipoib_stop+0x88/0x178 [ib_ipoib] >> <4>[c000000749fe39f0] [c0000000004eacf4] .dev_close+0xdc/0x148 >> <4>[c000000749fe3a80] [c0000000004ea2b8] .dev_change_flags+0x1f0/0x288 >> <4>[c000000749fe3b20] [d00000001b7f11b8] .ipoib_remove_one+0xb8/0x140 [ib_ipoib] >> <4>[c000000749fe3bc0] [d00000001210425c] .ib_unregister_client+0xb4/0x1b8 [ib_core] >> <4>[c000000749fe3c90] [d00000001b7ffde8] .ipoib_cleanup_module+0x20/0x60 [ib_ipoib] >> <4>[c000000749fe3d20] [c0000000000ec408] .SyS_delete_module+0x238/0x320 >> <4>[c000000749fe3e30] [c0000000000085b4] syscall_exit+0x0/0x40 >> <1>Unable to handle kernel paging request for data at address 0x45000027228d1ffb >> <1>Faulting instruction address: 0xc0000000005a8e88 >> 12:mon> e >> cpu 0x12: Vector: 300 (Data Access) at [c000000749fe3250] >> pc: c0000000005a8e88: .wait_for_common+0xb8/0x268 >> lr: c0000000005a8e20: .wait_for_common+0x50/0x268 >> sp: c000000749fe34d0 >> msr: 8000000000009032 >> dar: 45000027228d1ffb >> dsisr: 42000000 >> current = 0xc00000074b4ce0e0 >> paca = 0xc000000000f64a00 >> pid = 13605, comm = modprobe >> 12:mon> >> >> Thanks >> Pradeep > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 0/1] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path [not found] ` <20100817203619.22174.62871.stgit-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org> 2010-08-17 20:36 ` [PATCH v5] " Ralph Campbell @ 2010-08-17 20:48 ` Ralph Campbell 1 sibling, 0 replies; 7+ messages in thread From: Ralph Campbell @ 2010-08-17 20:48 UTC (permalink / raw) To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org BTW, I will be around today and tomorrow to reply to comments but then I will be on vacation through August. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-10-01 22:38 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-17 20:36 [PATCH v5 0/1] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path Ralph Campbell
[not found] ` <20100817203619.22174.62871.stgit-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>
2010-08-17 20:36 ` [PATCH v5] " Ralph Campbell
[not found] ` <20100817203624.22174.69480.stgit-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>
2010-09-03 3:41 ` Pradeep Satyanarayana
[not found] ` <4C806E72.6030507-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2010-09-03 16:06 ` Ralph Campbell
2010-10-01 0:35 ` Ralph Campbell
[not found] ` <1285893348.22791.120.camel-/vjeY7uYZjrPXfVEPVhPGq6RkeBMCJyt@public.gmane.org>
2010-10-01 22:38 ` Pradeep Satyanarayana
2010-08-17 20:48 ` [PATCH v5 0/1] " Ralph Campbell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox