* [PATCH] cxgb3i: ref count cdev access to prevent modification while in use (v3)
@ 2011-08-11 18:06 Neil Horman
2011-08-11 18:36 ` James Bottomley
0 siblings, 1 reply; 7+ messages in thread
From: Neil Horman @ 2011-08-11 18:06 UTC (permalink / raw)
To: linux-scsi
Cc: Neil Horman, Divy Le Ray, Steve Wise, Karen Xie, Mike Christie,
James E.J. Bottomley, stable-commits
This oops was reported recently:
d:mon> e
cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120]
pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3]
lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i]
sp: c0000000fd4c73a0
msr: 8000000000009032
dar: 0
dsisr: 40000000
current = 0xc0000000fd640d40
paca = 0xc00000000054ff80
pid = 5085, comm = iscsid
d:mon> t
[c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4 [cxgb3i]
[c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8 [libcxgbi]
[c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18
[scsi_transport_iscsi2]
[c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4
[c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c
[c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c
[c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8
[c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac
[c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c
[c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 00000080da560cfc
The root cause was an EEH error, which sent us down the offload_close path in
the cxgb3 driver, which in turn sets cdev->lldev to NULL, without regard for
upper layer driver (like the cxgbi drivers) which might have execution contexts
in the middle of its use. The result is the oops above, when t3_l2t_get attempts
to dereference cdev->lldev right after the EEH error handler sets it to NULL.
The fix is to reference count the cdev structure. When an EEH error occurs, the
shutdown path:
t3_adapter_error->offload_close->cxgb3i_remove_clients->cxgb3i_dev_close
will now block until such time as the cdev pointer has a use count of zero.
This coupled with the fact that lookups will now skip finding any registered
cdev's in cxgbi_device_find_by_[lldev|netdev] with the CXGBI_FLAG_ADAPTER_RESET
bit set ensures that on an EEH, the setting of lldev to NULL in offload_close
will only happen after there are no longer any active users of the data
structure.
This has been tested by the reporter and shown to fix the reproted oops
Change notes:
v2) Resent because of some build warnings with the first version on request of
JBottomley
v3) Modified to use kref refcounting api. It doesn't do much for this code,
since the purpose of refcounting is primarily to serialize the clearing of a
pointer in another data structure in a different module, but I think it does
look a bit cleaner. I've built this code, but not had the chance to test it
yet, given that I'm about to leave on vacation. I wanted to get it posted for
review before I left. It should work fine, since its effectively a replacement
of the raw atomic with struct krefs. Karen, if you could run this on some
cxgb3i equipment to make sure I didn't inadvertently do anything stupid, I'd
aprpeciate it. Thanks!
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Divy Le Ray <divy@chelsio.com>
CC: Steve Wise <swise@chelsio.com>
CC: Karen Xie <kxie@chelsio.com>
CC: Mike Christie <michaelc@cs.wisc.edu>
CC: "James E.J. Bottomley" <JBottomley@parallels.com>
CC: stable-commits@vger.kernel.org
---
drivers/scsi/cxgbi/cxgb3i/cxgb3i.c | 8 +++++++-
drivers/scsi/cxgbi/libcxgbi.c | 21 +++++++++++++++++----
drivers/scsi/cxgbi/libcxgbi.h | 5 +++++
3 files changed, 29 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
index bd22041..855ee23 100644
--- a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
+++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
@@ -1303,9 +1303,13 @@ static void cxgb3i_dev_close(struct t3cdev *t3dev)
if (!cdev || cdev->flags & CXGBI_FLAG_ADAPTER_RESET) {
pr_info("0x%p close, f 0x%x.\n", cdev, cdev ? cdev->flags : 0);
+ if (cdev)
+ cdev_put(cdev);
+ while (cdev && atomic_read(&cdev->use_count.refcount) > 1)
+ msleep(1);
return;
}
-
+ cdev_put(cdev);
cxgbi_device_unregister(cdev);
}
@@ -1320,6 +1324,7 @@ static void cxgb3i_dev_open(struct t3cdev *t3dev)
int i, err;
if (cdev) {
+ cdev_put(cdev);
pr_info("0x%p, updating.\n", cdev);
return;
}
@@ -1392,6 +1397,7 @@ static void cxgb3i_dev_event_handler(struct t3cdev *t3dev, u32 event, u32 port)
cdev->flags &= ~CXGBI_FLAG_ADAPTER_RESET;
break;
}
+ cdev_put(cdev);
}
/**
diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index 77ac217..297efaa 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -95,8 +95,10 @@ void cxgbi_device_portmap_cleanup(struct cxgbi_device *cdev)
}
EXPORT_SYMBOL_GPL(cxgbi_device_portmap_cleanup);
-static inline void cxgbi_device_destroy(struct cxgbi_device *cdev)
+void cxgbi_device_destroy(struct kref *kr)
{
+ struct cxgbi_device *cdev = container_of(kr, struct cxgbi_device, use_count);
+
log_debug(1 << CXGBI_DBG_DEV,
"cdev 0x%p, p# %u.\n", cdev, cdev->nports);
cxgbi_hbas_remove(cdev);
@@ -111,6 +113,7 @@ static inline void cxgbi_device_destroy(struct cxgbi_device *cdev)
cxgbi_free_big_mem(cdev->pmap.port_csk);
kfree(cdev);
}
+EXPORT_SYMBOL_GPL(cxgbi_device_destroy);
struct cxgbi_device *cxgbi_device_register(unsigned int extra,
unsigned int nports)
@@ -125,6 +128,7 @@ struct cxgbi_device *cxgbi_device_register(unsigned int extra,
pr_warn("nport %d, OOM.\n", nports);
return NULL;
}
+ kref_init(&cdev->use_count);
cdev->ports = (struct net_device **)(cdev + 1);
cdev->hbas = (struct cxgbi_hba **)(((char*)cdev->ports) + nports *
sizeof(struct net_device *));
@@ -151,7 +155,7 @@ void cxgbi_device_unregister(struct cxgbi_device *cdev)
mutex_lock(&cdev_mutex);
list_del(&cdev->list_head);
mutex_unlock(&cdev_mutex);
- cxgbi_device_destroy(cdev);
+ cdev_put(cdev);
}
EXPORT_SYMBOL_GPL(cxgbi_device_unregister);
@@ -167,7 +171,7 @@ void cxgbi_device_unregister_all(unsigned int flag)
cdev, cdev->nports, cdev->nports ?
cdev->ports[0]->name : "");
list_del(&cdev->list_head);
- cxgbi_device_destroy(cdev);
+ cdev_put(cdev);
}
}
mutex_unlock(&cdev_mutex);
@@ -181,6 +185,9 @@ struct cxgbi_device *cxgbi_device_find_by_lldev(void *lldev)
mutex_lock(&cdev_mutex);
list_for_each_entry_safe(cdev, tmp, &cdev_list, list_head) {
if (cdev->lldev == lldev) {
+ if (cdev->flags & CXGBI_FLAG_ADAPTER_RESET)
+ continue;
+ cdev_hold(cdev);
mutex_unlock(&cdev_mutex);
return cdev;
}
@@ -210,7 +217,10 @@ static struct cxgbi_device *cxgbi_device_find_by_netdev(struct net_device *ndev,
list_for_each_entry_safe(cdev, tmp, &cdev_list, list_head) {
for (i = 0; i < cdev->nports; i++) {
if (ndev == cdev->ports[i]) {
+ if (cdev->flags & CXGBI_FLAG_ADAPTER_RESET)
+ continue;
cdev->hbas[i]->vdev = vdev;
+ cdev_hold(cdev);
mutex_unlock(&cdev_mutex);
if (port)
*port = i;
@@ -469,7 +479,7 @@ static struct cxgbi_sock *cxgbi_check_route(struct sockaddr *dst_addr)
struct sockaddr_in *daddr = (struct sockaddr_in *)dst_addr;
struct dst_entry *dst;
struct net_device *ndev;
- struct cxgbi_device *cdev;
+ struct cxgbi_device *cdev = NULL;
struct rtable *rt = NULL;
struct flowi4 fl4;
struct cxgbi_sock *csk = NULL;
@@ -542,6 +552,8 @@ rel_rt:
if (csk)
cxgbi_sock_closed(csk);
err_out:
+ if (cdev)
+ cdev_put(cdev);
return ERR_PTR(err);
}
@@ -2491,6 +2503,7 @@ struct iscsi_endpoint *cxgbi_ep_connect(struct Scsi_Host *shost,
return ep;
release_conn:
+ cdev_put(csk->cdev);
cxgbi_sock_put(csk);
cxgbi_sock_closed(csk);
err_out:
diff --git a/drivers/scsi/cxgbi/libcxgbi.h b/drivers/scsi/cxgbi/libcxgbi.h
index 9267844..08788ac 100644
--- a/drivers/scsi/cxgbi/libcxgbi.h
+++ b/drivers/scsi/cxgbi/libcxgbi.h
@@ -24,6 +24,7 @@
#include <linux/scatterlist.h>
#include <linux/skbuff.h>
#include <linux/vmalloc.h>
+#include <linux/kref.h>
#include <scsi/scsi_device.h>
#include <scsi/libiscsi_tcp.h>
@@ -514,6 +515,7 @@ struct cxgbi_device {
unsigned int flags;
struct net_device **ports;
void *lldev;
+ struct kref use_count;
struct cxgbi_hba **hbas;
const unsigned short *mtus;
unsigned char nmtus;
@@ -557,6 +559,8 @@ struct cxgbi_device {
void *dd_data;
};
#define cxgbi_cdev_priv(cdev) ((cdev)->dd_data)
+#define cdev_hold(x) do {kref_get(&(x)->use_count);} while(0)
+#define cdev_put(x) do {kref_put(&(x)->use_count, cxgbi_device_destroy);} while(0)
struct cxgbi_conn {
struct cxgbi_endpoint *cep;
@@ -691,6 +695,7 @@ static inline __be32 cxgbi_get_iscsi_ipv4(struct cxgbi_hba *chba)
struct cxgbi_device *cxgbi_device_register(unsigned int, unsigned int);
void cxgbi_device_unregister(struct cxgbi_device *);
void cxgbi_device_unregister_all(unsigned int flag);
+void cxgbi_device_destroy(struct kref *kr);
struct cxgbi_device *cxgbi_device_find_by_lldev(void *);
int cxgbi_hbas_add(struct cxgbi_device *, unsigned int, unsigned int,
struct scsi_host_template *,
--
1.7.6
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] cxgb3i: ref count cdev access to prevent modification while in use (v3)
2011-08-11 18:06 [PATCH] cxgb3i: ref count cdev access to prevent modification while in use (v3) Neil Horman
@ 2011-08-11 18:36 ` James Bottomley
2011-08-11 19:11 ` Neil Horman
0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2011-08-11 18:36 UTC (permalink / raw)
To: Neil Horman
Cc: linux-scsi@vger.kernel.org, Divy Le Ray, Steve Wise, Karen Xie,
Mike Christie, stable-commits@vger.kernel.org
On Thu, 2011-08-11 at 14:06 -0400, Neil Horman wrote:
> This oops was reported recently:
> d:mon> e
> cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120]
> pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3]
> lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i]
> sp: c0000000fd4c73a0
> msr: 8000000000009032
> dar: 0
> dsisr: 40000000
> current = 0xc0000000fd640d40
> paca = 0xc00000000054ff80
> pid = 5085, comm = iscsid
> d:mon> t
> [c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4 [cxgb3i]
> [c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8 [libcxgbi]
> [c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18
> [scsi_transport_iscsi2]
> [c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4
> [c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c
> [c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c
> [c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8
> [c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac
> [c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c
> [c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40
> --- Exception: c01 (System Call) at 00000080da560cfc
>
> The root cause was an EEH error, which sent us down the offload_close path in
> the cxgb3 driver, which in turn sets cdev->lldev to NULL, without regard for
> upper layer driver (like the cxgbi drivers) which might have execution contexts
> in the middle of its use. The result is the oops above, when t3_l2t_get attempts
> to dereference cdev->lldev right after the EEH error handler sets it to NULL.
>
> The fix is to reference count the cdev structure. When an EEH error occurs, the
> shutdown path:
> t3_adapter_error->offload_close->cxgb3i_remove_clients->cxgb3i_dev_close
> will now block until such time as the cdev pointer has a use count of zero.
> This coupled with the fact that lookups will now skip finding any registered
> cdev's in cxgbi_device_find_by_[lldev|netdev] with the CXGBI_FLAG_ADAPTER_RESET
> bit set ensures that on an EEH, the setting of lldev to NULL in offload_close
> will only happen after there are no longer any active users of the data
> structure.
>
> This has been tested by the reporter and shown to fix the reproted oops
>
> Change notes:
>
> v2) Resent because of some build warnings with the first version on request of
> JBottomley
>
> v3) Modified to use kref refcounting api. It doesn't do much for this code,
> since the purpose of refcounting is primarily to serialize the clearing of a
> pointer in another data structure in a different module, but I think it does
> look a bit cleaner. I've built this code, but not had the chance to test it
> yet, given that I'm about to leave on vacation. I wanted to get it posted for
> review before I left. It should work fine, since its effectively a replacement
> of the raw atomic with struct krefs. Karen, if you could run this on some
> cxgb3i equipment to make sure I didn't inadvertently do anything stupid, I'd
> aprpeciate it. Thanks!
With any of these updates, doesn't cxgb4i also need updating?
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: Divy Le Ray <divy@chelsio.com>
> CC: Steve Wise <swise@chelsio.com>
> CC: Karen Xie <kxie@chelsio.com>
> CC: Mike Christie <michaelc@cs.wisc.edu>
> CC: "James E.J. Bottomley" <JBottomley@parallels.com>
> CC: stable-commits@vger.kernel.org
> ---
> drivers/scsi/cxgbi/cxgb3i/cxgb3i.c | 8 +++++++-
> drivers/scsi/cxgbi/libcxgbi.c | 21 +++++++++++++++++----
> drivers/scsi/cxgbi/libcxgbi.h | 5 +++++
> 3 files changed, 29 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> index bd22041..855ee23 100644
> --- a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> +++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> @@ -1303,9 +1303,13 @@ static void cxgb3i_dev_close(struct t3cdev *t3dev)
>
> if (!cdev || cdev->flags & CXGBI_FLAG_ADAPTER_RESET) {
> pr_info("0x%p close, f 0x%x.\n", cdev, cdev ? cdev->flags : 0);
> + if (cdev)
> + cdev_put(cdev);
> + while (cdev && atomic_read(&cdev->use_count.refcount) > 1)
> + msleep(1);
OK, so this was what leapt out the last time as a big no-no. Firstly,
do you really have to wait for destruction? If so, use a completion.
> return;
> }
> -
> + cdev_put(cdev);
> cxgbi_device_unregister(cdev);
> }
>
> @@ -1320,6 +1324,7 @@ static void cxgb3i_dev_open(struct t3cdev *t3dev)
> int i, err;
>
> if (cdev) {
> + cdev_put(cdev);
> pr_info("0x%p, updating.\n", cdev);
Other way around, otherwise it's use after free.
> return;
> }
> @@ -1392,6 +1397,7 @@ static void cxgb3i_dev_event_handler(struct t3cdev *t3dev, u32 event, u32 port)
> cdev->flags &= ~CXGBI_FLAG_ADAPTER_RESET;
> break;
> }
> + cdev_put(cdev);
> }
>
> /**
> diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
> index 77ac217..297efaa 100644
> --- a/drivers/scsi/cxgbi/libcxgbi.c
> +++ b/drivers/scsi/cxgbi/libcxgbi.c
> @@ -95,8 +95,10 @@ void cxgbi_device_portmap_cleanup(struct cxgbi_device *cdev)
> }
> EXPORT_SYMBOL_GPL(cxgbi_device_portmap_cleanup);
>
> -static inline void cxgbi_device_destroy(struct cxgbi_device *cdev)
> +void cxgbi_device_destroy(struct kref *kr)
> {
> + struct cxgbi_device *cdev = container_of(kr, struct cxgbi_device, use_count);
> +
> log_debug(1 << CXGBI_DBG_DEV,
> "cdev 0x%p, p# %u.\n", cdev, cdev->nports);
> cxgbi_hbas_remove(cdev);
> @@ -111,6 +113,7 @@ static inline void cxgbi_device_destroy(struct cxgbi_device *cdev)
> cxgbi_free_big_mem(cdev->pmap.port_csk);
> kfree(cdev);
> }
> +EXPORT_SYMBOL_GPL(cxgbi_device_destroy);
There's no cross module use, is there? so no need of an export.
> struct cxgbi_device *cxgbi_device_register(unsigned int extra,
> unsigned int nports)
> @@ -125,6 +128,7 @@ struct cxgbi_device *cxgbi_device_register(unsigned int extra,
> pr_warn("nport %d, OOM.\n", nports);
> return NULL;
> }
> + kref_init(&cdev->use_count);
> cdev->ports = (struct net_device **)(cdev + 1);
> cdev->hbas = (struct cxgbi_hba **)(((char*)cdev->ports) + nports *
> sizeof(struct net_device *));
> @@ -151,7 +155,7 @@ void cxgbi_device_unregister(struct cxgbi_device *cdev)
> mutex_lock(&cdev_mutex);
> list_del(&cdev->list_head);
> mutex_unlock(&cdev_mutex);
> - cxgbi_device_destroy(cdev);
> + cdev_put(cdev);
> }
> EXPORT_SYMBOL_GPL(cxgbi_device_unregister);
>
> @@ -167,7 +171,7 @@ void cxgbi_device_unregister_all(unsigned int flag)
> cdev, cdev->nports, cdev->nports ?
> cdev->ports[0]->name : "");
> list_del(&cdev->list_head);
> - cxgbi_device_destroy(cdev);
> + cdev_put(cdev);
> }
> }
> mutex_unlock(&cdev_mutex);
> @@ -181,6 +185,9 @@ struct cxgbi_device *cxgbi_device_find_by_lldev(void *lldev)
> mutex_lock(&cdev_mutex);
> list_for_each_entry_safe(cdev, tmp, &cdev_list, list_head) {
> if (cdev->lldev == lldev) {
> + if (cdev->flags & CXGBI_FLAG_ADAPTER_RESET)
> + continue;
> + cdev_hold(cdev);
> mutex_unlock(&cdev_mutex);
> return cdev;
> }
> @@ -210,7 +217,10 @@ static struct cxgbi_device *cxgbi_device_find_by_netdev(struct net_device *ndev,
> list_for_each_entry_safe(cdev, tmp, &cdev_list, list_head) {
> for (i = 0; i < cdev->nports; i++) {
> if (ndev == cdev->ports[i]) {
> + if (cdev->flags & CXGBI_FLAG_ADAPTER_RESET)
> + continue;
> cdev->hbas[i]->vdev = vdev;
> + cdev_hold(cdev);
> mutex_unlock(&cdev_mutex);
> if (port)
> *port = i;
> @@ -469,7 +479,7 @@ static struct cxgbi_sock *cxgbi_check_route(struct sockaddr *dst_addr)
> struct sockaddr_in *daddr = (struct sockaddr_in *)dst_addr;
> struct dst_entry *dst;
> struct net_device *ndev;
> - struct cxgbi_device *cdev;
> + struct cxgbi_device *cdev = NULL;
> struct rtable *rt = NULL;
> struct flowi4 fl4;
> struct cxgbi_sock *csk = NULL;
> @@ -542,6 +552,8 @@ rel_rt:
> if (csk)
> cxgbi_sock_closed(csk);
> err_out:
> + if (cdev)
> + cdev_put(cdev);
> return ERR_PTR(err);
> }
>
> @@ -2491,6 +2503,7 @@ struct iscsi_endpoint *cxgbi_ep_connect(struct Scsi_Host *shost,
> return ep;
>
> release_conn:
> + cdev_put(csk->cdev);
This isn't right. an embedded device reference should be released in
the csk release function (cxgbi_sock_free).
> cxgbi_sock_put(csk);
> cxgbi_sock_closed(csk);
> err_out:
> diff --git a/drivers/scsi/cxgbi/libcxgbi.h b/drivers/scsi/cxgbi/libcxgbi.h
> index 9267844..08788ac 100644
> --- a/drivers/scsi/cxgbi/libcxgbi.h
> +++ b/drivers/scsi/cxgbi/libcxgbi.h
> @@ -24,6 +24,7 @@
> #include <linux/scatterlist.h>
> #include <linux/skbuff.h>
> #include <linux/vmalloc.h>
> +#include <linux/kref.h>
> #include <scsi/scsi_device.h>
> #include <scsi/libiscsi_tcp.h>
>
> @@ -514,6 +515,7 @@ struct cxgbi_device {
> unsigned int flags;
> struct net_device **ports;
> void *lldev;
> + struct kref use_count;
> struct cxgbi_hba **hbas;
> const unsigned short *mtus;
> unsigned char nmtus;
> @@ -557,6 +559,8 @@ struct cxgbi_device {
> void *dd_data;
> };
> #define cxgbi_cdev_priv(cdev) ((cdev)->dd_data)
> +#define cdev_hold(x) do {kref_get(&(x)->use_count);} while(0)
hold is non standard, use put instead.
> +#define cdev_put(x) do {kref_put(&(x)->use_count, cxgbi_device_destroy);} while(0)
>
> struct cxgbi_conn {
> struct cxgbi_endpoint *cep;
> @@ -691,6 +695,7 @@ static inline __be32 cxgbi_get_iscsi_ipv4(struct cxgbi_hba *chba)
> struct cxgbi_device *cxgbi_device_register(unsigned int, unsigned int);
> void cxgbi_device_unregister(struct cxgbi_device *);
> void cxgbi_device_unregister_all(unsigned int flag);
> +void cxgbi_device_destroy(struct kref *kr);
> struct cxgbi_device *cxgbi_device_find_by_lldev(void *);
> int cxgbi_hbas_add(struct cxgbi_device *, unsigned int, unsigned int,
> struct scsi_host_template *,
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] cxgb3i: ref count cdev access to prevent modification while in use (v3)
2011-08-11 18:36 ` James Bottomley
@ 2011-08-11 19:11 ` Neil Horman
2011-08-11 19:47 ` Karen Xie
2011-08-11 20:18 ` James Bottomley
0 siblings, 2 replies; 7+ messages in thread
From: Neil Horman @ 2011-08-11 19:11 UTC (permalink / raw)
To: James Bottomley
Cc: linux-scsi@vger.kernel.org, Divy Le Ray, Steve Wise, Karen Xie,
Mike Christie, stable-commits@vger.kernel.org
On Thu, Aug 11, 2011 at 06:36:10PM +0000, James Bottomley wrote:
> On Thu, 2011-08-11 at 14:06 -0400, Neil Horman wrote:
> > This oops was reported recently:
> > d:mon> e
> > cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120]
> > pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3]
> > lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i]
> > sp: c0000000fd4c73a0
> > msr: 8000000000009032
> > dar: 0
> > dsisr: 40000000
> > current = 0xc0000000fd640d40
> > paca = 0xc00000000054ff80
> > pid = 5085, comm = iscsid
> > d:mon> t
> > [c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4 [cxgb3i]
> > [c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8 [libcxgbi]
> > [c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18
> > [scsi_transport_iscsi2]
> > [c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4
> > [c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c
> > [c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c
> > [c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8
> > [c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac
> > [c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c
> > [c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40
> > --- Exception: c01 (System Call) at 00000080da560cfc
> >
> > The root cause was an EEH error, which sent us down the offload_close path in
> > the cxgb3 driver, which in turn sets cdev->lldev to NULL, without regard for
> > upper layer driver (like the cxgbi drivers) which might have execution contexts
> > in the middle of its use. The result is the oops above, when t3_l2t_get attempts
> > to dereference cdev->lldev right after the EEH error handler sets it to NULL.
> >
> > The fix is to reference count the cdev structure. When an EEH error occurs, the
> > shutdown path:
> > t3_adapter_error->offload_close->cxgb3i_remove_clients->cxgb3i_dev_close
> > will now block until such time as the cdev pointer has a use count of zero.
> > This coupled with the fact that lookups will now skip finding any registered
> > cdev's in cxgbi_device_find_by_[lldev|netdev] with the CXGBI_FLAG_ADAPTER_RESET
> > bit set ensures that on an EEH, the setting of lldev to NULL in offload_close
> > will only happen after there are no longer any active users of the data
> > structure.
> >
> > This has been tested by the reporter and shown to fix the reproted oops
> >
> > Change notes:
> >
> > v2) Resent because of some build warnings with the first version on request of
> > JBottomley
> >
> > v3) Modified to use kref refcounting api. It doesn't do much for this code,
> > since the purpose of refcounting is primarily to serialize the clearing of a
> > pointer in another data structure in a different module, but I think it does
> > look a bit cleaner. I've built this code, but not had the chance to test it
> > yet, given that I'm about to leave on vacation. I wanted to get it posted for
> > review before I left. It should work fine, since its effectively a replacement
> > of the raw atomic with struct krefs. Karen, if you could run this on some
> > cxgb3i equipment to make sure I didn't inadvertently do anything stupid, I'd
> > aprpeciate it. Thanks!
>
> With any of these updates, doesn't cxgb4i also need updating?
>
Possibly, I've not looked (nor had the opportunity to get testing) done there
yet.
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: Divy Le Ray <divy@chelsio.com>
> > CC: Steve Wise <swise@chelsio.com>
> > CC: Karen Xie <kxie@chelsio.com>
> > CC: Mike Christie <michaelc@cs.wisc.edu>
> > CC: "James E.J. Bottomley" <JBottomley@parallels.com>
> > CC: stable-commits@vger.kernel.org
> > ---
> > drivers/scsi/cxgbi/cxgb3i/cxgb3i.c | 8 +++++++-
> > drivers/scsi/cxgbi/libcxgbi.c | 21 +++++++++++++++++----
> > drivers/scsi/cxgbi/libcxgbi.h | 5 +++++
> > 3 files changed, 29 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> > index bd22041..855ee23 100644
> > --- a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> > +++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> > @@ -1303,9 +1303,13 @@ static void cxgb3i_dev_close(struct t3cdev *t3dev)
> >
> > if (!cdev || cdev->flags & CXGBI_FLAG_ADAPTER_RESET) {
> > pr_info("0x%p close, f 0x%x.\n", cdev, cdev ? cdev->flags : 0);
> > + if (cdev)
> > + cdev_put(cdev);
> > + while (cdev && atomic_read(&cdev->use_count.refcount) > 1)
> > + msleep(1);
>
> OK, so this was what leapt out the last time as a big no-no. Firstly,
> do you really have to wait for destruction? If so, use a completion.
>
So, this is really the crux of the issue. I'm sorry If it wasn't sufficiently
clear in the explination above. Let me try again:
The root of the problem is the racy assignments of ->lldev in offload_open and
ofload_close. cdev->lldev is dereferenced in init_act_open every time a new
iscsi connection is established. If we encounter an adapter error and go down
the cxgb3 EEH path, we wind up in offload_close, which sets->lldev to NULL. If
this occurs while a connection is being established in init_act_open, we can get
a NULL pointer as ->lldev, and we get the oops above.
So my approach to fix this was to serialize the setting of cdev->lldev to NULL behind
a guarantee that there are no current users of cdev. Since the cxgb3i driver
only grabs cdevs via cxgbi_find_by_lldev or find_by_netdev, a reference count
seemed like the way to go. Not sure I could do the same efficiently with a
completion.
I had also thought about moving the NULL setting into part of the release
routine of kref_put (or previously my raw atomic use), but since that code (or
the code that accesses it isn't at all serialized with a spinlock, I figured a
reference count would be more efficient. It all boils down (In my mind) to the
fact that we're just modifying a pointer here, not actually dstroying a struct.
Truthfully, the way this code works probably needs a larger re-working than what
I'm doing here, given how often I got turned around in it, but I'm nowhere near
familiar enough with this driver to do that effectively. I'm just trying to fix
the oops at hand.
> > return;
> > }
> > -
> > + cdev_put(cdev);
> > cxgbi_device_unregister(cdev);
> > }
> >
> > @@ -1320,6 +1324,7 @@ static void cxgb3i_dev_open(struct t3cdev *t3dev)
> > int i, err;
> >
> > if (cdev) {
> > + cdev_put(cdev);
> > pr_info("0x%p, updating.\n", cdev);
>
> Other way around, otherwise it's use after free.
>
Doh, thanks!
> > return;
> > }
> > @@ -1392,6 +1397,7 @@ static void cxgb3i_dev_event_handler(struct t3cdev *t3dev, u32 event, u32 port)
> > cdev->flags &= ~CXGBI_FLAG_ADAPTER_RESET;
> > break;
> > }
> > + cdev_put(cdev);
> > }
> >
> > /**
> > diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
> > index 77ac217..297efaa 100644
> > --- a/drivers/scsi/cxgbi/libcxgbi.c
> > +++ b/drivers/scsi/cxgbi/libcxgbi.c
> > @@ -95,8 +95,10 @@ void cxgbi_device_portmap_cleanup(struct cxgbi_device *cdev)
> > }
> > EXPORT_SYMBOL_GPL(cxgbi_device_portmap_cleanup);
> >
> > -static inline void cxgbi_device_destroy(struct cxgbi_device *cdev)
> > +void cxgbi_device_destroy(struct kref *kr)
> > {
> > + struct cxgbi_device *cdev = container_of(kr, struct cxgbi_device, use_count);
> > +
> > log_debug(1 << CXGBI_DBG_DEV,
> > "cdev 0x%p, p# %u.\n", cdev, cdev->nports);
> > cxgbi_hbas_remove(cdev);
> > @@ -111,6 +113,7 @@ static inline void cxgbi_device_destroy(struct cxgbi_device *cdev)
> > cxgbi_free_big_mem(cdev->pmap.port_csk);
> > kfree(cdev);
> > }
> > +EXPORT_SYMBOL_GPL(cxgbi_device_destroy);
>
> There's no cross module use, is there? so no need of an export.
I suppose not. Sorry, was thinking that libcxgbi had common code that wen't in
its own module.
>
> > struct cxgbi_device *cxgbi_device_register(unsigned int extra,
> > unsigned int nports)
> > @@ -125,6 +128,7 @@ struct cxgbi_device *cxgbi_device_register(unsigned int extra,
> > pr_warn("nport %d, OOM.\n", nports);
> > return NULL;
> > }
> > + kref_init(&cdev->use_count);
> > cdev->ports = (struct net_device **)(cdev + 1);
> > cdev->hbas = (struct cxgbi_hba **)(((char*)cdev->ports) + nports *
> > sizeof(struct net_device *));
> > @@ -151,7 +155,7 @@ void cxgbi_device_unregister(struct cxgbi_device *cdev)
> > mutex_lock(&cdev_mutex);
> > list_del(&cdev->list_head);
> > mutex_unlock(&cdev_mutex);
> > - cxgbi_device_destroy(cdev);
> > + cdev_put(cdev);
> > }
> > EXPORT_SYMBOL_GPL(cxgbi_device_unregister);
> >
> > @@ -167,7 +171,7 @@ void cxgbi_device_unregister_all(unsigned int flag)
> > cdev, cdev->nports, cdev->nports ?
> > cdev->ports[0]->name : "");
> > list_del(&cdev->list_head);
> > - cxgbi_device_destroy(cdev);
> > + cdev_put(cdev);
> > }
> > }
> > mutex_unlock(&cdev_mutex);
> > @@ -181,6 +185,9 @@ struct cxgbi_device *cxgbi_device_find_by_lldev(void *lldev)
> > mutex_lock(&cdev_mutex);
> > list_for_each_entry_safe(cdev, tmp, &cdev_list, list_head) {
> > if (cdev->lldev == lldev) {
> > + if (cdev->flags & CXGBI_FLAG_ADAPTER_RESET)
> > + continue;
> > + cdev_hold(cdev);
> > mutex_unlock(&cdev_mutex);
> > return cdev;
> > }
> > @@ -210,7 +217,10 @@ static struct cxgbi_device *cxgbi_device_find_by_netdev(struct net_device *ndev,
> > list_for_each_entry_safe(cdev, tmp, &cdev_list, list_head) {
> > for (i = 0; i < cdev->nports; i++) {
> > if (ndev == cdev->ports[i]) {
> > + if (cdev->flags & CXGBI_FLAG_ADAPTER_RESET)
> > + continue;
> > cdev->hbas[i]->vdev = vdev;
> > + cdev_hold(cdev);
> > mutex_unlock(&cdev_mutex);
> > if (port)
> > *port = i;
> > @@ -469,7 +479,7 @@ static struct cxgbi_sock *cxgbi_check_route(struct sockaddr *dst_addr)
> > struct sockaddr_in *daddr = (struct sockaddr_in *)dst_addr;
> > struct dst_entry *dst;
> > struct net_device *ndev;
> > - struct cxgbi_device *cdev;
> > + struct cxgbi_device *cdev = NULL;
> > struct rtable *rt = NULL;
> > struct flowi4 fl4;
> > struct cxgbi_sock *csk = NULL;
> > @@ -542,6 +552,8 @@ rel_rt:
> > if (csk)
> > cxgbi_sock_closed(csk);
> > err_out:
> > + if (cdev)
> > + cdev_put(cdev);
> > return ERR_PTR(err);
> > }
> >
> > @@ -2491,6 +2503,7 @@ struct iscsi_endpoint *cxgbi_ep_connect(struct Scsi_Host *shost,
> > return ep;
> >
> > release_conn:
> > + cdev_put(csk->cdev);
>
> This isn't right. an embedded device reference should be released in
> the csk release function (cxgbi_sock_free).
>
Ok, I can fix that
> > cxgbi_sock_put(csk);
> > cxgbi_sock_closed(csk);
> > err_out:
> > diff --git a/drivers/scsi/cxgbi/libcxgbi.h b/drivers/scsi/cxgbi/libcxgbi.h
> > index 9267844..08788ac 100644
> > --- a/drivers/scsi/cxgbi/libcxgbi.h
> > +++ b/drivers/scsi/cxgbi/libcxgbi.h
> > @@ -24,6 +24,7 @@
> > #include <linux/scatterlist.h>
> > #include <linux/skbuff.h>
> > #include <linux/vmalloc.h>
> > +#include <linux/kref.h>
> > #include <scsi/scsi_device.h>
> > #include <scsi/libiscsi_tcp.h>
> >
> > @@ -514,6 +515,7 @@ struct cxgbi_device {
> > unsigned int flags;
> > struct net_device **ports;
> > void *lldev;
> > + struct kref use_count;
> > struct cxgbi_hba **hbas;
> > const unsigned short *mtus;
> > unsigned char nmtus;
> > @@ -557,6 +559,8 @@ struct cxgbi_device {
> > void *dd_data;
> > };
> > #define cxgbi_cdev_priv(cdev) ((cdev)->dd_data)
> > +#define cdev_hold(x) do {kref_get(&(x)->use_count);} while(0)
>
> hold is non standard, use put instead.
>
You mean get? I can do that.
Given what I noted above, do you think its worth pursuing this? Or does this
need a larger fix than what a refcount can offer. We want to be able to
tdel_lldev to NULL in offload_close safely, when there are no other users of
that structure. I can't really see a way to do it without completely
overhauling how this code works. I suppose we could do some rcu magic to do the
serialization, but I'm nto sure that would be much better. Let me know if
you're ok with the above wait loop, and if so, I'll make the other corrections
you've noted and repost
Regards
Neil
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] cxgb3i: ref count cdev access to prevent modification while in use (v3)
2011-08-11 19:11 ` Neil Horman
@ 2011-08-11 19:47 ` Karen Xie
2011-08-11 20:25 ` Neil Horman
2011-08-11 20:18 ` James Bottomley
1 sibling, 1 reply; 7+ messages in thread
From: Karen Xie @ 2011-08-11 19:47 UTC (permalink / raw)
To: Neil Horman, James Bottomley
Cc: linux-scsi, Divy Le Ray, Steve Wise, Mike Christie,
stable-commits
Hi, Neil,
Do you mind if I take this patch and modify it for both cxgb3i/4i
drivers?
Thanks,
Karen
-----Original Message-----
From: Neil Horman [mailto:nhorman@tuxdriver.com]
Sent: Thursday, August 11, 2011 12:12 PM
To: James Bottomley
Cc: linux-scsi@vger.kernel.org; Divy Le Ray; Steve Wise; Karen Xie; Mike
Christie; stable-commits@vger.kernel.org
Subject: Re: [PATCH] cxgb3i: ref count cdev access to prevent
modification while in use (v3)
On Thu, Aug 11, 2011 at 06:36:10PM +0000, James Bottomley wrote:
> On Thu, 2011-08-11 at 14:06 -0400, Neil Horman wrote:
> > This oops was reported recently:
> > d:mon> e
> > cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120]
> > pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3]
> > lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i]
> > sp: c0000000fd4c73a0
> > msr: 8000000000009032
> > dar: 0
> > dsisr: 40000000
> > current = 0xc0000000fd640d40
> > paca = 0xc00000000054ff80
> > pid = 5085, comm = iscsid
> > d:mon> t
> > [c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4
[cxgb3i]
> > [c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8
[libcxgbi]
> > [c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18
> > [scsi_transport_iscsi2]
> > [c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4
> > [c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c
> > [c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c
> > [c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8
> > [c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac
> > [c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c
> > [c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40
> > --- Exception: c01 (System Call) at 00000080da560cfc
> >
> > The root cause was an EEH error, which sent us down the
offload_close path in
> > the cxgb3 driver, which in turn sets cdev->lldev to NULL, without
regard for
> > upper layer driver (like the cxgbi drivers) which might have
execution contexts
> > in the middle of its use. The result is the oops above, when
t3_l2t_get attempts
> > to dereference cdev->lldev right after the EEH error handler sets it
to NULL.
> >
> > The fix is to reference count the cdev structure. When an EEH error
occurs, the
> > shutdown path:
> >
t3_adapter_error->offload_close->cxgb3i_remove_clients->cxgb3i_dev_close
> > will now block until such time as the cdev pointer has a use count
of zero.
> > This coupled with the fact that lookups will now skip finding any
registered
> > cdev's in cxgbi_device_find_by_[lldev|netdev] with the
CXGBI_FLAG_ADAPTER_RESET
> > bit set ensures that on an EEH, the setting of lldev to NULL in
offload_close
> > will only happen after there are no longer any active users of the
data
> > structure.
> >
> > This has been tested by the reporter and shown to fix the reproted
oops
> >
> > Change notes:
> >
> > v2) Resent because of some build warnings with the first version on
request of
> > JBottomley
> >
> > v3) Modified to use kref refcounting api. It doesn't do much for
this code,
> > since the purpose of refcounting is primarily to serialize the
clearing of a
> > pointer in another data structure in a different module, but I think
it does
> > look a bit cleaner. I've built this code, but not had the chance to
test it
> > yet, given that I'm about to leave on vacation. I wanted to get it
posted for
> > review before I left. It should work fine, since its effectively a
replacement
> > of the raw atomic with struct krefs. Karen, if you could run this
on some
> > cxgb3i equipment to make sure I didn't inadvertently do anything
stupid, I'd
> > aprpeciate it. Thanks!
>
> With any of these updates, doesn't cxgb4i also need updating?
>
Possibly, I've not looked (nor had the opportunity to get testing) done
there
yet.
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: Divy Le Ray <divy@chelsio.com>
> > CC: Steve Wise <swise@chelsio.com>
> > CC: Karen Xie <kxie@chelsio.com>
> > CC: Mike Christie <michaelc@cs.wisc.edu>
> > CC: "James E.J. Bottomley" <JBottomley@parallels.com>
> > CC: stable-commits@vger.kernel.org
> > ---
> > drivers/scsi/cxgbi/cxgb3i/cxgb3i.c | 8 +++++++-
> > drivers/scsi/cxgbi/libcxgbi.c | 21 +++++++++++++++++----
> > drivers/scsi/cxgbi/libcxgbi.h | 5 +++++
> > 3 files changed, 29 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> > index bd22041..855ee23 100644
> > --- a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> > +++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> > @@ -1303,9 +1303,13 @@ static void cxgb3i_dev_close(struct t3cdev
*t3dev)
> >
> > if (!cdev || cdev->flags & CXGBI_FLAG_ADAPTER_RESET) {
> > pr_info("0x%p close, f 0x%x.\n", cdev, cdev ?
cdev->flags : 0);
> > + if (cdev)
> > + cdev_put(cdev);
> > + while (cdev && atomic_read(&cdev->use_count.refcount) >
1)
> > + msleep(1);
>
> OK, so this was what leapt out the last time as a big no-no. Firstly,
> do you really have to wait for destruction? If so, use a completion.
>
So, this is really the crux of the issue. I'm sorry If it wasn't
sufficiently
clear in the explination above. Let me try again:
The root of the problem is the racy assignments of ->lldev in
offload_open and
ofload_close. cdev->lldev is dereferenced in init_act_open every time a
new
iscsi connection is established. If we encounter an adapter error and
go down
the cxgb3 EEH path, we wind up in offload_close, which sets->lldev to
NULL. If
this occurs while a connection is being established in init_act_open, we
can get
a NULL pointer as ->lldev, and we get the oops above.
So my approach to fix this was to serialize the setting of cdev->lldev
to NULL behind
a guarantee that there are no current users of cdev. Since the cxgb3i
driver
only grabs cdevs via cxgbi_find_by_lldev or find_by_netdev, a reference
count
seemed like the way to go. Not sure I could do the same efficiently
with a
completion.
I had also thought about moving the NULL setting into part of the
release
routine of kref_put (or previously my raw atomic use), but since that
code (or
the code that accesses it isn't at all serialized with a spinlock, I
figured a
reference count would be more efficient. It all boils down (In my mind)
to the
fact that we're just modifying a pointer here, not actually dstroying a
struct.
Truthfully, the way this code works probably needs a larger re-working
than what
I'm doing here, given how often I got turned around in it, but I'm
nowhere near
familiar enough with this driver to do that effectively. I'm just
trying to fix
the oops at hand.
> > return;
> > }
> > -
> > + cdev_put(cdev);
> > cxgbi_device_unregister(cdev);
> > }
> >
> > @@ -1320,6 +1324,7 @@ static void cxgb3i_dev_open(struct t3cdev
*t3dev)
> > int i, err;
> >
> > if (cdev) {
> > + cdev_put(cdev);
> > pr_info("0x%p, updating.\n", cdev);
>
> Other way around, otherwise it's use after free.
>
Doh, thanks!
> > return;
> > }
> > @@ -1392,6 +1397,7 @@ static void cxgb3i_dev_event_handler(struct
t3cdev *t3dev, u32 event, u32 port)
> > cdev->flags &= ~CXGBI_FLAG_ADAPTER_RESET;
> > break;
> > }
> > + cdev_put(cdev);
> > }
> >
> > /**
> > diff --git a/drivers/scsi/cxgbi/libcxgbi.c
b/drivers/scsi/cxgbi/libcxgbi.c
> > index 77ac217..297efaa 100644
> > --- a/drivers/scsi/cxgbi/libcxgbi.c
> > +++ b/drivers/scsi/cxgbi/libcxgbi.c
> > @@ -95,8 +95,10 @@ void cxgbi_device_portmap_cleanup(struct
cxgbi_device *cdev)
> > }
> > EXPORT_SYMBOL_GPL(cxgbi_device_portmap_cleanup);
> >
> > -static inline void cxgbi_device_destroy(struct cxgbi_device *cdev)
> > +void cxgbi_device_destroy(struct kref *kr)
> > {
> > + struct cxgbi_device *cdev = container_of(kr, struct
cxgbi_device, use_count);
> > +
> > log_debug(1 << CXGBI_DBG_DEV,
> > "cdev 0x%p, p# %u.\n", cdev, cdev->nports);
> > cxgbi_hbas_remove(cdev);
> > @@ -111,6 +113,7 @@ static inline void cxgbi_device_destroy(struct
cxgbi_device *cdev)
> > cxgbi_free_big_mem(cdev->pmap.port_csk);
> > kfree(cdev);
> > }
> > +EXPORT_SYMBOL_GPL(cxgbi_device_destroy);
>
> There's no cross module use, is there? so no need of an export.
I suppose not. Sorry, was thinking that libcxgbi had common code that
wen't in
its own module.
>
> > struct cxgbi_device *cxgbi_device_register(unsigned int extra,
> > unsigned int nports)
> > @@ -125,6 +128,7 @@ struct cxgbi_device
*cxgbi_device_register(unsigned int extra,
> > pr_warn("nport %d, OOM.\n", nports);
> > return NULL;
> > }
> > + kref_init(&cdev->use_count);
> > cdev->ports = (struct net_device **)(cdev + 1);
> > cdev->hbas = (struct cxgbi_hba **)(((char*)cdev->ports) + nports
*
> > sizeof(struct net_device
*));
> > @@ -151,7 +155,7 @@ void cxgbi_device_unregister(struct cxgbi_device
*cdev)
> > mutex_lock(&cdev_mutex);
> > list_del(&cdev->list_head);
> > mutex_unlock(&cdev_mutex);
> > - cxgbi_device_destroy(cdev);
> > + cdev_put(cdev);
> > }
> > EXPORT_SYMBOL_GPL(cxgbi_device_unregister);
> >
> > @@ -167,7 +171,7 @@ void cxgbi_device_unregister_all(unsigned int
flag)
> > cdev, cdev->nports, cdev->nports ?
> > cdev->ports[0]->name : "");
> > list_del(&cdev->list_head);
> > - cxgbi_device_destroy(cdev);
> > + cdev_put(cdev);
> > }
> > }
> > mutex_unlock(&cdev_mutex);
> > @@ -181,6 +185,9 @@ struct cxgbi_device
*cxgbi_device_find_by_lldev(void *lldev)
> > mutex_lock(&cdev_mutex);
> > list_for_each_entry_safe(cdev, tmp, &cdev_list, list_head) {
> > if (cdev->lldev == lldev) {
> > + if (cdev->flags & CXGBI_FLAG_ADAPTER_RESET)
> > + continue;
> > + cdev_hold(cdev);
> > mutex_unlock(&cdev_mutex);
> > return cdev;
> > }
> > @@ -210,7 +217,10 @@ static struct cxgbi_device
*cxgbi_device_find_by_netdev(struct net_device *ndev,
> > list_for_each_entry_safe(cdev, tmp, &cdev_list, list_head) {
> > for (i = 0; i < cdev->nports; i++) {
> > if (ndev == cdev->ports[i]) {
> > + if (cdev->flags &
CXGBI_FLAG_ADAPTER_RESET)
> > + continue;
> > cdev->hbas[i]->vdev = vdev;
> > + cdev_hold(cdev);
> > mutex_unlock(&cdev_mutex);
> > if (port)
> > *port = i;
> > @@ -469,7 +479,7 @@ static struct cxgbi_sock
*cxgbi_check_route(struct sockaddr *dst_addr)
> > struct sockaddr_in *daddr = (struct sockaddr_in *)dst_addr;
> > struct dst_entry *dst;
> > struct net_device *ndev;
> > - struct cxgbi_device *cdev;
> > + struct cxgbi_device *cdev = NULL;
> > struct rtable *rt = NULL;
> > struct flowi4 fl4;
> > struct cxgbi_sock *csk = NULL;
> > @@ -542,6 +552,8 @@ rel_rt:
> > if (csk)
> > cxgbi_sock_closed(csk);
> > err_out:
> > + if (cdev)
> > + cdev_put(cdev);
> > return ERR_PTR(err);
> > }
> >
> > @@ -2491,6 +2503,7 @@ struct iscsi_endpoint *cxgbi_ep_connect(struct
Scsi_Host *shost,
> > return ep;
> >
> > release_conn:
> > + cdev_put(csk->cdev);
>
> This isn't right. an embedded device reference should be released in
> the csk release function (cxgbi_sock_free).
>
Ok, I can fix that
> > cxgbi_sock_put(csk);
> > cxgbi_sock_closed(csk);
> > err_out:
> > diff --git a/drivers/scsi/cxgbi/libcxgbi.h
b/drivers/scsi/cxgbi/libcxgbi.h
> > index 9267844..08788ac 100644
> > --- a/drivers/scsi/cxgbi/libcxgbi.h
> > +++ b/drivers/scsi/cxgbi/libcxgbi.h
> > @@ -24,6 +24,7 @@
> > #include <linux/scatterlist.h>
> > #include <linux/skbuff.h>
> > #include <linux/vmalloc.h>
> > +#include <linux/kref.h>
> > #include <scsi/scsi_device.h>
> > #include <scsi/libiscsi_tcp.h>
> >
> > @@ -514,6 +515,7 @@ struct cxgbi_device {
> > unsigned int flags;
> > struct net_device **ports;
> > void *lldev;
> > + struct kref use_count;
> > struct cxgbi_hba **hbas;
> > const unsigned short *mtus;
> > unsigned char nmtus;
> > @@ -557,6 +559,8 @@ struct cxgbi_device {
> > void *dd_data;
> > };
> > #define cxgbi_cdev_priv(cdev) ((cdev)->dd_data)
> > +#define cdev_hold(x) do {kref_get(&(x)->use_count);} while(0)
>
> hold is non standard, use put instead.
>
You mean get? I can do that.
Given what I noted above, do you think its worth pursuing this? Or does
this
need a larger fix than what a refcount can offer. We want to be able to
tdel_lldev to NULL in offload_close safely, when there are no other
users of
that structure. I can't really see a way to do it without completely
overhauling how this code works. I suppose we could do some rcu magic
to do the
serialization, but I'm nto sure that would be much better. Let me know
if
you're ok with the above wait loop, and if so, I'll make the other
corrections
you've noted and repost
Regards
Neil
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] cxgb3i: ref count cdev access to prevent modification while in use (v3)
2011-08-11 19:11 ` Neil Horman
2011-08-11 19:47 ` Karen Xie
@ 2011-08-11 20:18 ` James Bottomley
2011-08-12 20:22 ` Neil Horman
1 sibling, 1 reply; 7+ messages in thread
From: James Bottomley @ 2011-08-11 20:18 UTC (permalink / raw)
To: Neil Horman
Cc: linux-scsi@vger.kernel.org, Divy Le Ray, Steve Wise, Karen Xie,
Mike Christie, stable-commits@vger.kernel.org
On Thu, 2011-08-11 at 15:11 -0400, Neil Horman wrote:
> On Thu, Aug 11, 2011 at 06:36:10PM +0000, James Bottomley wrote:
> > > +++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> > > @@ -1303,9 +1303,13 @@ static void cxgb3i_dev_close(struct t3cdev *t3dev)
> > >
> > > if (!cdev || cdev->flags & CXGBI_FLAG_ADAPTER_RESET) {
> > > pr_info("0x%p close, f 0x%x.\n", cdev, cdev ? cdev->flags : 0);
> > > + if (cdev)
> > > + cdev_put(cdev);
> > > + while (cdev && atomic_read(&cdev->use_count.refcount) > 1)
> > > + msleep(1);
> >
> > OK, so this was what leapt out the last time as a big no-no. Firstly,
> > do you really have to wait for destruction? If so, use a completion.
> >
> So, this is really the crux of the issue. I'm sorry If it wasn't sufficiently
> clear in the explination above. Let me try again:
>
> The root of the problem is the racy assignments of ->lldev in offload_open and
> ofload_close. cdev->lldev is dereferenced in init_act_open every time a new
> iscsi connection is established. If we encounter an adapter error and go down
> the cxgb3 EEH path, we wind up in offload_close, which sets->lldev to NULL. If
> this occurs while a connection is being established in init_act_open, we can get
> a NULL pointer as ->lldev, and we get the oops above.
>
> So my approach to fix this was to serialize the setting of cdev->lldev to NULL behind
> a guarantee that there are no current users of cdev. Since the cxgb3i driver
> only grabs cdevs via cxgbi_find_by_lldev or find_by_netdev, a reference count
> seemed like the way to go. Not sure I could do the same efficiently with a
> completion.
So this basically means you use the above code to hold offload_close()
at cxgb3_remove_clients() until there are no more references ... which
is an undefined length of time.
That's actually rather a horrible hack.
Why does offload_close() have to NULL out tdev->lldev? Since the model
is effectively refcounted
> I had also thought about moving the NULL setting into part of the release
> routine of kref_put (or previously my raw atomic use), but since that code (or
> the code that accesses it isn't at all serialized with a spinlock, I figured a
> reference count would be more efficient. It all boils down (In my mind) to the
> fact that we're just modifying a pointer here, not actually dstroying a struct.
>
> Truthfully, the way this code works probably needs a larger re-working than what
> I'm doing here, given how often I got turned around in it, but I'm nowhere near
> familiar enough with this driver to do that effectively. I'm just trying to fix
> the oops at hand.
Can't this all be fixed just by making init_act_open() check for a NULL
t3 dev and return error if it is?
This is effectively equivalent to a cancellation path. Now that's not
always possible, so if it's not possible, the rework would have to
refcount around lldev I think.
James
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] cxgb3i: ref count cdev access to prevent modification while in use (v3)
2011-08-11 19:47 ` Karen Xie
@ 2011-08-11 20:25 ` Neil Horman
0 siblings, 0 replies; 7+ messages in thread
From: Neil Horman @ 2011-08-11 20:25 UTC (permalink / raw)
To: Karen Xie
Cc: James Bottomley, linux-scsi, Divy Le Ray, Steve Wise,
Mike Christie, stable-commits
On Thu, Aug 11, 2011 at 12:47:23PM -0700, Karen Xie wrote:
> Hi, Neil,
>
> Do you mind if I take this patch and modify it for both cxgb3i/4i
> drivers?
>
If you would like, by all means, go ahead
Thanks
Neil
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] cxgb3i: ref count cdev access to prevent modification while in use (v3)
2011-08-11 20:18 ` James Bottomley
@ 2011-08-12 20:22 ` Neil Horman
0 siblings, 0 replies; 7+ messages in thread
From: Neil Horman @ 2011-08-12 20:22 UTC (permalink / raw)
To: James Bottomley
Cc: linux-scsi@vger.kernel.org, Divy Le Ray, Steve Wise, Karen Xie,
Mike Christie, stable-commits@vger.kernel.org
On Thu, Aug 11, 2011 at 03:18:03PM -0500, James Bottomley wrote:
> On Thu, 2011-08-11 at 15:11 -0400, Neil Horman wrote:
> > On Thu, Aug 11, 2011 at 06:36:10PM +0000, James Bottomley wrote:
> > > > +++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> > > > @@ -1303,9 +1303,13 @@ static void cxgb3i_dev_close(struct t3cdev *t3dev)
> > > >
> > > > if (!cdev || cdev->flags & CXGBI_FLAG_ADAPTER_RESET) {
> > > > pr_info("0x%p close, f 0x%x.\n", cdev, cdev ? cdev->flags : 0);
> > > > + if (cdev)
> > > > + cdev_put(cdev);
> > > > + while (cdev && atomic_read(&cdev->use_count.refcount) > 1)
> > > > + msleep(1);
> > >
> > > OK, so this was what leapt out the last time as a big no-no. Firstly,
> > > do you really have to wait for destruction? If so, use a completion.
> > >
> > So, this is really the crux of the issue. I'm sorry If it wasn't sufficiently
> > clear in the explination above. Let me try again:
> >
> > The root of the problem is the racy assignments of ->lldev in offload_open and
> > ofload_close. cdev->lldev is dereferenced in init_act_open every time a new
> > iscsi connection is established. If we encounter an adapter error and go down
> > the cxgb3 EEH path, we wind up in offload_close, which sets->lldev to NULL. If
> > this occurs while a connection is being established in init_act_open, we can get
> > a NULL pointer as ->lldev, and we get the oops above.
> >
> > So my approach to fix this was to serialize the setting of cdev->lldev to NULL behind
> > a guarantee that there are no current users of cdev. Since the cxgb3i driver
> > only grabs cdevs via cxgbi_find_by_lldev or find_by_netdev, a reference count
> > seemed like the way to go. Not sure I could do the same efficiently with a
> > completion.
>
> So this basically means you use the above code to hold offload_close()
> at cxgb3_remove_clients() until there are no more references ... which
> is an undefined length of time.
>
> That's actually rather a horrible hack.
>
> Why does offload_close() have to NULL out tdev->lldev? Since the model
> is effectively refcounted
>
> > I had also thought about moving the NULL setting into part of the release
> > routine of kref_put (or previously my raw atomic use), but since that code (or
> > the code that accesses it isn't at all serialized with a spinlock, I figured a
> > reference count would be more efficient. It all boils down (In my mind) to the
> > fact that we're just modifying a pointer here, not actually dstroying a struct.
> >
> > Truthfully, the way this code works probably needs a larger re-working than what
> > I'm doing here, given how often I got turned around in it, but I'm nowhere near
> > familiar enough with this driver to do that effectively. I'm just trying to fix
> > the oops at hand.
>
> Can't this all be fixed just by making init_act_open() check for a NULL
> t3 dev and return error if it is?
>
> This is effectively equivalent to a cancellation path. Now that's not
> always possible, so if it's not possible, the rework would have to
> refcount around lldev I think.
>
> James
>
>
>
So I've spent the day looking at this, and I'd like to rescind the whole thing.
I apologize, its completely wrong. After looking at all the code more closely I
found that both struct cxgbi_device structs and t3cdev structs have lldev
members the former lldev member points to the latters type. Its all horridly
confusing. At any rate, Its really the t3cdev->l2opt I think that needs
refcounting. I'm going to rework the whole thing when I get back.
Regards
Neil
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-08-12 20:23 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-11 18:06 [PATCH] cxgb3i: ref count cdev access to prevent modification while in use (v3) Neil Horman
2011-08-11 18:36 ` James Bottomley
2011-08-11 19:11 ` Neil Horman
2011-08-11 19:47 ` Karen Xie
2011-08-11 20:25 ` Neil Horman
2011-08-11 20:18 ` James Bottomley
2011-08-12 20:22 ` Neil Horman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox