* Re: [PATCH] qla_target: Check refcount in find_sess_by_* [not found] ` <20120517162600.GA18067@logfs.org> @ 2012-05-17 19:46 ` Nicholas A. Bellinger 2012-05-17 20:49 ` Roland Dreier 0 siblings, 1 reply; 13+ messages in thread From: Nicholas A. Bellinger @ 2012-05-17 19:46 UTC (permalink / raw) To: Jörn Engel; +Cc: target-devel, Arun Easi, linux-scsi On Thu, 2012-05-17 at 12:26 -0400, Jörn Engel wrote: > As the comment explains, we have a race between dropping the refcount to > zero and actually removing the session from lookup tables. Plug this by > checking the refcount on lookup. > > Signed-off-by: Joern Engel <joern@logfs.org> > --- > drivers/scsi/qla2xxx/tcm_qla2xxx.c | 28 ++++++++++++++++++++++++++++ > 1 file changed, 28 insertions(+) > Hi Joern! > diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c > index 09a70aa..b1bfa08 100644 > --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c > +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c > @@ -1300,6 +1300,20 @@ static struct qla_tgt_sess *tcm_qla2xxx_find_sess_by_s_id( > return NULL; > } > > + /* > + * We drop the hardware_lock between dropping the reference count to > + * zero and removing the session from the btree. Therefore it > + * is possible to successfully lookup a session after it has been > + * scheduled for removal, but before actual removal. Result is pretty > + * good chance of dereferencing freed memory in qlt_issue_task_mgmt. > + * > + * Proper fix would be to atomically drop the refcount and remove the > + * session from all data structures that would allow lookup. Until > + * then this tasteless hack shall suffice. > + */ > + if (atomic_read(&nacl->qla_tgt_sess->se_sess->sess_kref.refcount) == 0) > + return NULL; > + > return nacl->qla_tgt_sess; > } > Mmmmm, I'd still like to try to avoid (another) atomic op in the fast path w/ hardware_lock held if at all possible.. Also thinking more about the case in your comment, I'm starting to wonder if tcm_qla2xxx_free_session() should be obtaining hardware_lock + clearing the s_id and loop_id lookup entries earlier than currently done.. That is, target_wait_for_sess_cmds() is called before clearing the nodeacl pointers from s_id + loop_id in tcm_qla2xxx_free_session(), which means that we still perform se_node_acl lookups in the case where the individual tcm_qla2xxx endpoint was not also being shutdown as well.. So that said, I think your work-around is OK to address the current BUG, but the s_id + loop_id clearing order in tcm_qla2xxx_free_session() would be having an effect here as well.. --nab ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] qla_target: Check refcount in find_sess_by_* 2012-05-17 19:46 ` [PATCH] qla_target: Check refcount in find_sess_by_* Nicholas A. Bellinger @ 2012-05-17 20:49 ` Roland Dreier 2012-05-17 19:28 ` Jörn Engel 2012-05-17 21:24 ` Nicholas A. Bellinger 0 siblings, 2 replies; 13+ messages in thread From: Roland Dreier @ 2012-05-17 20:49 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Jörn Engel, target-devel, Arun Easi, linux-scsi > Mmmmm, I'd still like to try to avoid (another) atomic op in the fast > path w/ hardware_lock held if at all possible.. atomic_read() is not an atomic op. However this patch is not right, see below. > Also thinking more about the case in your comment, I'm starting to > wonder if tcm_qla2xxx_free_session() should be obtaining hardware_lock + > clearing the s_id and loop_id lookup entries earlier than currently > done.. > > That is, target_wait_for_sess_cmds() is called before clearing the > nodeacl pointers from s_id + loop_id in tcm_qla2xxx_free_session(), > which means that we still perform se_node_acl lookups in the case where > the individual tcm_qla2xxx endpoint was not also being shutdown as > well.. > > So that said, I think your work-around is OK to address the current BUG, > but the s_id + loop_id clearing order in tcm_qla2xxx_free_session() > would be having an effect here as well.. Yes, in fact we need to clear the session from all lookup tables even earlier than ->free_session. We need to make sure that as soon as target_release_session() is called, that session will no longer be used for any new commands. So qla_tgt_unreg_sess() needs to clear the session from the "by loop_id" table as well as the "by fc_id" table. - R. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] qla_target: Check refcount in find_sess_by_* 2012-05-17 20:49 ` Roland Dreier @ 2012-05-17 19:28 ` Jörn Engel 2012-05-17 21:05 ` Jörn Engel 2012-05-17 21:24 ` Nicholas A. Bellinger 1 sibling, 1 reply; 13+ messages in thread From: Jörn Engel @ 2012-05-17 19:28 UTC (permalink / raw) To: Roland Dreier; +Cc: Nicholas A. Bellinger, target-devel, Arun Easi, linux-scsi On Thu, 17 May 2012 13:49:38 -0700, Roland Dreier wrote: > > > Mmmmm, I'd still like to try to avoid (another) atomic op in the fast > > path w/ hardware_lock held if at all possible.. > > atomic_read() is not an atomic op. Or more importantly it doesn't force the cacheline out of shared mode, which is the performance killer to avoid. > > Also thinking more about the case in your comment, I'm starting to > > wonder if tcm_qla2xxx_free_session() should be obtaining hardware_lock + > > clearing the s_id and loop_id lookup entries earlier than currently > > done.. > > > > That is, target_wait_for_sess_cmds() is called before clearing the > > nodeacl pointers from s_id + loop_id in tcm_qla2xxx_free_session(), > > which means that we still perform se_node_acl lookups in the case where > > the individual tcm_qla2xxx endpoint was not also being shutdown as > > well.. > > > > So that said, I think your work-around is OK to address the current BUG, > > but the s_id + loop_id clearing order in tcm_qla2xxx_free_session() > > would be having an effect here as well.. > > Yes, in fact we need to clear the session from all lookup tables > even earlier than ->free_session. We need to make sure that as > soon as target_release_session() is called, that session will no > longer be used for any new commands. > > So qla_tgt_unreg_sess() needs to clear the session from the > "by loop_id" table as well as the "by fc_id" table. Not even that is sufficient. We need to atomically drop the refcount and remove the session from all lookup tables. Call stack is: tcm_qla2xxx_free_session <-- removes from lookup tables qla_tgt_free_session_done --- <-- move to workqueue, dropping hardware_lock qla_tgt_unreg_sess tcm_qla2xxx_close_session <-- takes hardware_lock target_release_session target_put_session <-- drops refcount Since the lookup tables are protected by hardware_lock, we would have to either take the hardware_lock _before_ dropping the reference count or do atomic_dec_and_lock(). Both of which requires knowledge about the qla2xxx hardware_lock in generic target code. Jörn -- Tough times don't last, but tough people do. -- Nigerian proverb ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] qla_target: Check refcount in find_sess_by_* 2012-05-17 19:28 ` Jörn Engel @ 2012-05-17 21:05 ` Jörn Engel 2012-05-17 21:10 ` Jörn Engel 2012-05-18 2:02 ` Nicholas A. Bellinger 0 siblings, 2 replies; 13+ messages in thread From: Jörn Engel @ 2012-05-17 21:05 UTC (permalink / raw) To: Roland Dreier Cc: Nicholas A. Bellinger, target-devel, Arun Easi, linux-scsi, Steve Hodgson On Thu, 17 May 2012 15:28:30 -0400, Jörn Engel wrote: > > tcm_qla2xxx_free_session <-- removes from lookup tables > qla_tgt_free_session_done > --- <-- move to workqueue, dropping hardware_lock > qla_tgt_unreg_sess > tcm_qla2xxx_close_session <-- takes hardware_lock > target_release_session > target_put_session <-- drops refcount > > Since the lookup tables are protected by hardware_lock, we would have > to either take the hardware_lock _before_ dropping the reference count > or do atomic_dec_and_lock(). Both of which requires knowledge about > the qla2xxx hardware_lock in generic target code. And below is a patch of RFC quality, at best. Assuming it actually does what it should do, this is still insufficient. tcm_qla2xxx_find_sess_by_* currently don't take a refcount and in particular the qla_tgt_reset() path will happily dereference the session without a refcount. Steve is currently working on that side. Anyway, are there any other problems with the patch that I have missed? Jörn -- You ain't got no problem, Jules. I'm on the motherfucker. Go back in there, chill them niggers out and wait for the Wolf, who should be coming directly. -- Marsellus Wallace [PATCH] qla_target: hold hardware_lock when dropping session refcount Signed-off-by: Joern Engel <joern@logfs.org> --- drivers/scsi/qla2xxx/tcm_qla2xxx.c | 22 +++++++++++++++++++++- drivers/target/iscsi/iscsi_target.c | 6 +++--- drivers/target/iscsi/iscsi_target_login.c | 4 ++-- drivers/target/target_core_tpg.c | 4 ++-- drivers/target/target_core_transport.c | 6 +++--- include/target/target_core_fabric.h | 3 ++- 6 files changed, 33 insertions(+), 12 deletions(-) diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c index 759c65b..bfdd6f2 100644 --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c @@ -934,9 +934,28 @@ void tcm_qla2xxx_clear_nacl_from_fcport_map(struct qla_tgt_sess *sess) nacl->nport_id); } +static void tcm_qla2xxx_release_session(struct kref *kref) +{ + struct se_session *se_sess = container_of(kref, + struct se_session, sess_kref); + + tcm_qla2xxx_close_session(se_sess); +} + +static void tcm_qla2xxx_put_session(struct se_session *se_sess) +{ + struct qla_tgt_sess *sess = se_sess->fabric_sess_ptr; + struct qla_hw_data *ha = sess->vha->hw; + unsigned long flags; + + lock_hardware(ha, &flags); + kref_put(&se_sess->sess_kref, tcm_qla2xxx_release_session); + unlock_hardware(ha, &flags); +} + void tcm_qla2xxx_put_sess(struct qla_tgt_sess *sess) { - target_put_session(sess->se_sess); + tcm_qla2xxx_put_session(sess->se_sess); } void tcm_qla2xxx_shutdown_sess(struct qla_tgt_sess *sess) @@ -1833,6 +1852,7 @@ static struct target_core_fabric_ops tcm_qla2xxx_ops = { .new_cmd_map = NULL, .check_stop_free = tcm_qla2xxx_check_stop_free, .release_cmd = tcm_qla2xxx_release_cmd, + .put_session = tcm_qla2xxx_put_session, .shutdown_session = tcm_qla2xxx_shutdown_session, .close_session = tcm_qla2xxx_close_session, .stop_session = tcm_qla2xxx_stop_session, diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index ebb835d..47fcd3a 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -4159,7 +4159,7 @@ int iscsit_close_connection( if (!atomic_read(&sess->session_reinstatement) && atomic_read(&sess->session_fall_back_to_erl0)) { spin_unlock_bh(&sess->conn_lock); - target_put_session(sess->se_sess); + generic_target_put_session(sess->se_sess); return 0; } else if (atomic_read(&sess->session_logout)) { @@ -4280,7 +4280,7 @@ static void iscsit_logout_post_handler_closesession( iscsit_dec_conn_usage_count(conn); iscsit_stop_session(sess, 1, 1); iscsit_dec_session_usage_count(sess); - target_put_session(sess->se_sess); + generic_target_put_session(sess->se_sess); } static void iscsit_logout_post_handler_samecid( @@ -4446,7 +4446,7 @@ int iscsit_free_session(struct iscsi_session *sess) } else spin_unlock_bh(&sess->conn_lock); - target_put_session(sess->se_sess); + generic_target_put_session(sess->se_sess); return 0; } diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c index 02bea10..046c102 100644 --- a/drivers/target/iscsi/iscsi_target_login.c +++ b/drivers/target/iscsi/iscsi_target_login.c @@ -180,7 +180,7 @@ int iscsi_check_for_session_reinstatement(struct iscsi_conn *conn) if (sess->session_state == TARG_SESS_STATE_FAILED) { spin_unlock_bh(&sess->conn_lock); iscsit_dec_session_usage_count(sess); - target_put_session(sess->se_sess); + generic_target_put_session(sess->se_sess); return 0; } spin_unlock_bh(&sess->conn_lock); @@ -188,7 +188,7 @@ int iscsi_check_for_session_reinstatement(struct iscsi_conn *conn) iscsit_stop_session(sess, 1, 1); iscsit_dec_session_usage_count(sess); - target_put_session(sess->se_sess); + generic_target_put_session(sess->se_sess); return 0; } diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c index 70c3ffb..dfc1b1b 100644 --- a/drivers/target/target_core_tpg.c +++ b/drivers/target/target_core_tpg.c @@ -510,10 +510,10 @@ int core_tpg_del_initiator_node_acl( list_del(&sess->sess_acl_list); rc = tpg->se_tpg_tfo->shutdown_session(sess); - target_put_session(sess); + tpg->se_tpg_tfo->put_session(sess); if (!rc) continue; - target_put_session(sess); + tpg->se_tpg_tfo->put_session(sess); } target_put_nacl(acl); /* diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 50cb798..63089b4 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -318,7 +318,7 @@ void transport_register_session( } EXPORT_SYMBOL(transport_register_session); -static void target_release_session(struct kref *kref) +void target_release_session(struct kref *kref) { struct se_session *se_sess = container_of(kref, struct se_session, sess_kref); @@ -333,11 +333,11 @@ void target_get_session(struct se_session *se_sess) } EXPORT_SYMBOL(target_get_session); -int target_put_session(struct se_session *se_sess) +int generic_target_put_session(struct se_session *se_sess) { return kref_put(&se_sess->sess_kref, target_release_session); } -EXPORT_SYMBOL(target_put_session); +EXPORT_SYMBOL(generic_target_put_session); static void target_complete_nacl(struct kref *kref) { diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h index b5927f1..ede0c84 100644 --- a/include/target/target_core_fabric.h +++ b/include/target/target_core_fabric.h @@ -51,6 +51,7 @@ struct target_core_fabric_ops { */ int (*check_release_cmd)(struct se_cmd *); void (*release_cmd)(struct se_cmd *); + void (*put_session)(struct se_session *se_sess); /* * Called with spin_lock_bh(struct se_portal_group->session_lock held. */ @@ -107,7 +108,7 @@ void __transport_register_session(struct se_portal_group *, void transport_register_session(struct se_portal_group *, struct se_node_acl *, struct se_session *, void *); void target_get_session(struct se_session *); -int target_put_session(struct se_session *); +int generic_target_put_session(struct se_session *); void target_session_i_t_nexus(struct se_session *, const u8 **, size_t *, const u8 **, size_t *); void transport_free_session(struct se_session *); -- 1.7.10 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] qla_target: Check refcount in find_sess_by_* 2012-05-17 21:05 ` Jörn Engel @ 2012-05-17 21:10 ` Jörn Engel 2012-05-18 2:02 ` Nicholas A. Bellinger 1 sibling, 0 replies; 13+ messages in thread From: Jörn Engel @ 2012-05-17 21:10 UTC (permalink / raw) To: Roland Dreier Cc: Nicholas A. Bellinger, target-devel, Arun Easi, linux-scsi, Steve Hodgson And below is a patch that allows speeding up the fast path a bit. The code in tcm_qla2xxx_put_session() would turn into this: local_irq_save(flags); if (kref_put_and_lock(&se_sess->sess_kref, &ha->hardware_lock, target_release_session)) spin_unlock(&ha->hardware_lock); local_irq_restore(flags); I don't propose we do this yet, correct code is more important than fast code at this point. But we should be able to reclaim most of the performance impact - if it is actually measureable. Jörn -- The grand essentials of happiness are: something to do, something to love, and something to hope for. -- Allan K. Chalmers [PATCH] kref: add kref_put_and_lock() Similar to kref_put(), but will only take a lock (and a performance hit) if the refcount drops to zero. Signed-off-by: Joern Engel <joern@logfs.org> --- include/linux/kref.h | 3 +++ lib/kref.c | 23 +++++++++++++++++++++++ 2 files changed, 26 insertions(+) diff --git a/include/linux/kref.h b/include/linux/kref.h index d4a62ab..9581493 100644 --- a/include/linux/kref.h +++ b/include/linux/kref.h @@ -16,6 +16,7 @@ #define _KREF_H_ #include <linux/types.h> +#include <linux/spinlock.h> struct kref { atomic_t refcount; @@ -24,6 +25,8 @@ struct kref { void kref_init(struct kref *kref); void kref_get(struct kref *kref); int kref_put(struct kref *kref, void (*release) (struct kref *kref)); +int kref_put_and_lock(struct kref *kref, spinlock_t *lock, + void (*release)(struct kref *kref)); int kref_sub(struct kref *kref, unsigned int count, void (*release) (struct kref *kref)); diff --git a/lib/kref.c b/lib/kref.c index 3efb882..07a82f3 100644 --- a/lib/kref.c +++ b/lib/kref.c @@ -62,6 +62,29 @@ int kref_put(struct kref *kref, void (*release)(struct kref *kref)) return 0; } +/** + * kref_put_and_lock - decrement refcount for object with locking + * @kref: object + * @lock: the lock to acquire when dropping the last refcount + * @release: pointer to the function that will clean up the object when the + * last reference to the object is released. + * + * Same as kref_put(), except that it will acquire the lock before dropping the + * last refcount. If the last refcount is dropped, the lock will be held on + * return and the return value will be 1. + */ +int kref_put_and_lock(struct kref *kref, spinlock_t *lock, + void (*release)(struct kref *kref)) +{ + WARN_ON(release == NULL); + WARN_ON(release == (void (*)(struct kref *))kfree); + + if (atomic_dec_and_lock(&kref->refcount, lock)) { + release(kref); + return 1; + } + return 0; +} /** * kref_sub - subtract a number of refcounts for object. -- 1.7.10 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] qla_target: Check refcount in find_sess_by_* 2012-05-17 21:05 ` Jörn Engel 2012-05-17 21:10 ` Jörn Engel @ 2012-05-18 2:02 ` Nicholas A. Bellinger 2012-05-18 1:15 ` Jörn Engel 1 sibling, 1 reply; 13+ messages in thread From: Nicholas A. Bellinger @ 2012-05-18 2:02 UTC (permalink / raw) To: Jörn Engel Cc: Roland Dreier, target-devel, Arun Easi, linux-scsi, Steve Hodgson On Thu, 2012-05-17 at 17:05 -0400, Jörn Engel wrote: > On Thu, 17 May 2012 15:28:30 -0400, Jörn Engel wrote: > > > > tcm_qla2xxx_free_session <-- removes from lookup tables > > qla_tgt_free_session_done > > --- <-- move to workqueue, dropping hardware_lock > > qla_tgt_unreg_sess > > tcm_qla2xxx_close_session <-- takes hardware_lock > > target_release_session > > target_put_session <-- drops refcount > > > > Since the lookup tables are protected by hardware_lock, we would have > > to either take the hardware_lock _before_ dropping the reference count > > or do atomic_dec_and_lock(). Both of which requires knowledge about > > the qla2xxx hardware_lock in generic target code. > > And below is a patch of RFC quality, at best. Assuming it actually > does what it should do, this is still insufficient. > tcm_qla2xxx_find_sess_by_* currently don't take a refcount and in > particular the qla_tgt_reset() path will happily dereference the > session without a refcount. Steve is currently working on that side. > > Anyway, are there any other problems with the patch that I have > missed? > Hey Joern, I think this patch to add a TFO->put_session() special case for tcm_qla2xxx (and possibly ib_srpt as well) is the better approach in order to handle when a fabric specific hardware lock needs to be taken around the final se_session kref put. My comments are below.. > Jörn > > -- > You ain't got no problem, Jules. I'm on the motherfucker. Go back in > there, chill them niggers out and wait for the Wolf, who should be > coming directly. > -- Marsellus Wallace > > [PATCH] qla_target: hold hardware_lock when dropping session refcount > > Signed-off-by: Joern Engel <joern@logfs.org> > --- > drivers/scsi/qla2xxx/tcm_qla2xxx.c | 22 +++++++++++++++++++++- > drivers/target/iscsi/iscsi_target.c | 6 +++--- > drivers/target/iscsi/iscsi_target_login.c | 4 ++-- > drivers/target/target_core_tpg.c | 4 ++-- > drivers/target/target_core_transport.c | 6 +++--- > include/target/target_core_fabric.h | 3 ++- > 6 files changed, 33 insertions(+), 12 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c > index 759c65b..bfdd6f2 100644 > --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c > +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c > @@ -934,9 +934,28 @@ void tcm_qla2xxx_clear_nacl_from_fcport_map(struct qla_tgt_sess *sess) > nacl->nport_id); > } > > +static void tcm_qla2xxx_release_session(struct kref *kref) > +{ > + struct se_session *se_sess = container_of(kref, > + struct se_session, sess_kref); > + > + tcm_qla2xxx_close_session(se_sess); > +} > + Deadlock here.. tcm_qla2xxx_close_session() takes the hardware_lock before calling qlt_unreg_sess().. > +static void tcm_qla2xxx_put_session(struct se_session *se_sess) > +{ > + struct qla_tgt_sess *sess = se_sess->fabric_sess_ptr; > + struct qla_hw_data *ha = sess->vha->hw; > + unsigned long flags; > + > + lock_hardware(ha, &flags); > + kref_put(&se_sess->sess_kref, tcm_qla2xxx_release_session); > + unlock_hardware(ha, &flags); > +} > + I'm not sure what wrapper lock_hardware + unlock_hardware looks like..? What special does it do beyond just obtaining ha->hardware_lock..? To build in lio-core there where just changed to normal irqsave() -> irqrestore() w/ ->hardware_lock > void tcm_qla2xxx_put_sess(struct qla_tgt_sess *sess) > { > - target_put_session(sess->se_sess); > + tcm_qla2xxx_put_session(sess->se_sess); > } > > void tcm_qla2xxx_shutdown_sess(struct qla_tgt_sess *sess) > @@ -1833,6 +1852,7 @@ static struct target_core_fabric_ops tcm_qla2xxx_ops = { > .new_cmd_map = NULL, > .check_stop_free = tcm_qla2xxx_check_stop_free, > .release_cmd = tcm_qla2xxx_release_cmd, > + .put_session = tcm_qla2xxx_put_session, > .shutdown_session = tcm_qla2xxx_shutdown_session, > .close_session = tcm_qla2xxx_close_session, > .stop_session = tcm_qla2xxx_stop_session, <SNIP> No need to rename iscsi-target stuff now, as it does not need ->put_session() or change it's current operation using target_put_session() for existing users.. > diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c > index 70c3ffb..dfc1b1b 100644 > --- a/drivers/target/target_core_tpg.c > +++ b/drivers/target/target_core_tpg.c > @@ -510,10 +510,10 @@ int core_tpg_del_initiator_node_acl( > list_del(&sess->sess_acl_list); > > rc = tpg->se_tpg_tfo->shutdown_session(sess); > - target_put_session(sess); > + tpg->se_tpg_tfo->put_session(sess); > if (!rc) > continue; > - target_put_session(sess); > + tpg->se_tpg_tfo->put_session(sess); > } > target_put_nacl(acl); > /* Note this breaks all the other fabric cases that don't provide a TFO->put_session() callback and will currently require a target_put_session() call. This call to TFO->put_sesssion() has been moved into target_put_session() the -v2 patch below.. > diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c > index 50cb798..63089b4 100644 > --- a/drivers/target/target_core_transport.c > +++ b/drivers/target/target_core_transport.c > @@ -318,7 +318,7 @@ void transport_register_session( > } > EXPORT_SYMBOL(transport_register_session); > > -static void target_release_session(struct kref *kref) > +void target_release_session(struct kref *kref) > { > struct se_session *se_sess = container_of(kref, > struct se_session, sess_kref); > @@ -333,11 +333,11 @@ void target_get_session(struct se_session *se_sess) > } > EXPORT_SYMBOL(target_get_session); > > -int target_put_session(struct se_session *se_sess) > +int generic_target_put_session(struct se_session *se_sess) > { > return kref_put(&se_sess->sess_kref, target_release_session); > } > -EXPORT_SYMBOL(target_put_session); > +EXPORT_SYMBOL(generic_target_put_session); > Ok, so here is what I've put together from your original patch that's currently running on top of lio-core.git. It appears to be doing what your original patch was attempting to do, but is still missing the extra refactoring + export of set_sess_by_*() callers as discussed earlier.. WDYT..? --nab diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c index 24b4e45..5aa8372 100644 --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c @@ -845,9 +845,28 @@ static void tcm_qla2xxx_clear_nacl_from_fcport_map(struct qla_tgt_sess *sess) nacl->nport_id); } +static void tcm_qla2xxx_release_session(struct kref *kref) +{ + struct se_session *se_sess = container_of(kref, + struct se_session, sess_kref); + + qlt_unreg_sess(se_sess->fabric_sess_ptr); +} + +static void tcm_qla2xxx_put_session(struct se_session *se_sess) +{ + struct qla_tgt_sess *sess = se_sess->fabric_sess_ptr; + struct qla_hw_data *ha = sess->vha->hw; + unsigned long flags; + + spin_lock_irqsave(&ha->hardware_lock, flags); + kref_put(&se_sess->sess_kref, tcm_qla2xxx_release_session); + spin_unlock_irqrestore(&ha->hardware_lock, flags); +} + static void tcm_qla2xxx_put_sess(struct qla_tgt_sess *sess) { - target_put_session(sess->se_sess); + tcm_qla2xxx_put_session(sess->se_sess); } static void tcm_qla2xxx_shutdown_sess(struct qla_tgt_sess *sess) @@ -1743,6 +1762,7 @@ static struct target_core_fabric_ops tcm_qla2xxx_ops = { .new_cmd_map = NULL, .check_stop_free = tcm_qla2xxx_check_stop_free, .release_cmd = tcm_qla2xxx_release_cmd, + .put_session = tcm_qla2xxx_put_session, .shutdown_session = tcm_qla2xxx_shutdown_session, .close_session = tcm_qla2xxx_close_session, .sess_get_index = tcm_qla2xxx_sess_get_index, @@ -1791,6 +1811,7 @@ static struct target_core_fabric_ops tcm_qla2xxx_npiv_ops = { .tpg_release_fabric_acl = tcm_qla2xxx_release_fabric_acl, .tpg_get_inst_index = tcm_qla2xxx_tpg_get_inst_index, .release_cmd = tcm_qla2xxx_release_cmd, + .put_session = tcm_qla2xxx_put_session, .shutdown_session = tcm_qla2xxx_shutdown_session, .close_session = tcm_qla2xxx_close_session, .sess_get_index = tcm_qla2xxx_sess_get_index, diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index e797986..6b7dc67 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -315,7 +315,7 @@ void transport_register_session( } EXPORT_SYMBOL(transport_register_session); -static void target_release_session(struct kref *kref) +void target_release_session(struct kref *kref) { struct se_session *se_sess = container_of(kref, struct se_session, sess_kref); @@ -332,6 +332,12 @@ EXPORT_SYMBOL(target_get_session); void target_put_session(struct se_session *se_sess) { + struct se_portal_group *tpg = se_sess->se_tpg; + + if (tpg->se_tpg_tfo->put_session != NULL) { + tpg->se_tpg_tfo->put_session(se_sess); + return; + } kref_put(&se_sess->sess_kref, target_release_session); } EXPORT_SYMBOL(target_put_session); diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h index 4e3f7a4..2c2af2b 100644 --- a/include/target/target_core_fabric.h +++ b/include/target/target_core_fabric.h @@ -47,6 +47,7 @@ struct target_core_fabric_ops { */ int (*check_stop_free)(struct se_cmd *); void (*release_cmd)(struct se_cmd *); + void (*put_session)(struct se_session *); /* * Called with spin_lock_bh(struct se_portal_group->session_lock held. */ -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] qla_target: Check refcount in find_sess_by_* 2012-05-18 2:02 ` Nicholas A. Bellinger @ 2012-05-18 1:15 ` Jörn Engel 2012-05-18 6:09 ` Nicholas A. Bellinger 0 siblings, 1 reply; 13+ messages in thread From: Jörn Engel @ 2012-05-18 1:15 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Roland Dreier, target-devel, Arun Easi, linux-scsi, Steve Hodgson On Thu, 17 May 2012 19:02:36 -0700, Nicholas A. Bellinger wrote: > > > > +static void tcm_qla2xxx_release_session(struct kref *kref) > > +{ > > + struct se_session *se_sess = container_of(kref, > > + struct se_session, sess_kref); > > + > > + tcm_qla2xxx_close_session(se_sess); > > +} > > + > > Deadlock here.. tcm_qla2xxx_close_session() takes the hardware_lock > before calling qlt_unreg_sess().. Good catch. > > +static void tcm_qla2xxx_put_session(struct se_session *se_sess) > > +{ > > + struct qla_tgt_sess *sess = se_sess->fabric_sess_ptr; > > + struct qla_hw_data *ha = sess->vha->hw; > > + unsigned long flags; > > + > > + lock_hardware(ha, &flags); > > + kref_put(&se_sess->sess_kref, tcm_qla2xxx_release_session); > > + unlock_hardware(ha, &flags); > > +} > > I'm not sure what wrapper lock_hardware + unlock_hardware looks like..? > What special does it do beyond just obtaining ha->hardware_lock..? I have replaced all instances of taking/releasing the hardware_lock in our tree with these wrappers and also added assert_hardware_locked(). We have a rare bug where qla_tgt_reset() holds the hardware_lock and qla_tgt_issue_task_mgmt() does not, as proven by the assertions. By now I have added all sorts of instrumentations in a desperate attempt to understand the bug. Main drawback is that porting patches between our tree and yours is increasingly painful, as you have noticed. > <SNIP> > > No need to rename iscsi-target stuff now, as it does not need > ->put_session() or change it's current operation using > target_put_session() for existing users.. Actually... I renamed the function to generic_target_put_session() so that transports without special needs can use that as their .put_session pointer. In principle the rename isn't needed, but the generic versions of similar callbacks tend to be named generic_foo() elsewhere and I like to follow common patterns. But more importantly, I completely forgot to add the function pointers to all the other transports. That needs fixing... > > diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c > > index 70c3ffb..dfc1b1b 100644 > > --- a/drivers/target/target_core_tpg.c > > +++ b/drivers/target/target_core_tpg.c > > @@ -510,10 +510,10 @@ int core_tpg_del_initiator_node_acl( > > list_del(&sess->sess_acl_list); > > > > rc = tpg->se_tpg_tfo->shutdown_session(sess); > > - target_put_session(sess); > > + tpg->se_tpg_tfo->put_session(sess); > > if (!rc) > > continue; > > - target_put_session(sess); > > + tpg->se_tpg_tfo->put_session(sess); > > } > > target_put_nacl(acl); > > /* > > Note this breaks all the other fabric cases that don't provide a > TFO->put_session() callback and will currently require a > target_put_session() call. > > This call to TFO->put_sesssion() has been moved into > target_put_session() the -v2 patch below.. ...as you also noticed. While your version isn't completely absurd, I learned to hate this pattern the hard way. In filesystem code, there are several dozen function pointers that default to buffer_head code. If the default was to dereference a NULL pointer and get a clean backtrace, developers of new filesystems would notice soon enough, add their own function pointer and move on. But since the default is to assume buffer_heads, you now use a filesystem-specific structure with buffer_head code and the result, often enough, is a fairly subtle corruption that can take several days to track down. The only thing that kept me back from removing those defaults was fear of missing a fix for one of the fourty-odd filesystems linux currently supports. We are fucked because we are so deeply fucked that noone dares to unfuck the situation. So if you try to follow the buffer_head pattern, prepare to dodge some rotten fruit the next few times we meet in person. I really _really_ hate that. I has cost me several weeks of my life that I could have spent being more productive and feeling less pain. <end mad rant> > diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c > index 24b4e45..5aa8372 100644 > --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c > +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c > @@ -845,9 +845,28 @@ static void tcm_qla2xxx_clear_nacl_from_fcport_map(struct qla_tgt_sess *sess) > nacl->nport_id); > } > > +static void tcm_qla2xxx_release_session(struct kref *kref) > +{ > + struct se_session *se_sess = container_of(kref, > + struct se_session, sess_kref); > + > + qlt_unreg_sess(se_sess->fabric_sess_ptr); > +} > + > +static void tcm_qla2xxx_put_session(struct se_session *se_sess) > +{ > + struct qla_tgt_sess *sess = se_sess->fabric_sess_ptr; > + struct qla_hw_data *ha = sess->vha->hw; > + unsigned long flags; > + > + spin_lock_irqsave(&ha->hardware_lock, flags); > + kref_put(&se_sess->sess_kref, tcm_qla2xxx_release_session); > + spin_unlock_irqrestore(&ha->hardware_lock, flags); > +} > + > static void tcm_qla2xxx_put_sess(struct qla_tgt_sess *sess) > { > - target_put_session(sess->se_sess); > + tcm_qla2xxx_put_session(sess->se_sess); > } > > static void tcm_qla2xxx_shutdown_sess(struct qla_tgt_sess *sess) > @@ -1743,6 +1762,7 @@ static struct target_core_fabric_ops tcm_qla2xxx_ops = { > .new_cmd_map = NULL, > .check_stop_free = tcm_qla2xxx_check_stop_free, > .release_cmd = tcm_qla2xxx_release_cmd, > + .put_session = tcm_qla2xxx_put_session, > .shutdown_session = tcm_qla2xxx_shutdown_session, > .close_session = tcm_qla2xxx_close_session, > .sess_get_index = tcm_qla2xxx_sess_get_index, > @@ -1791,6 +1811,7 @@ static struct target_core_fabric_ops tcm_qla2xxx_npiv_ops = { > .tpg_release_fabric_acl = tcm_qla2xxx_release_fabric_acl, > .tpg_get_inst_index = tcm_qla2xxx_tpg_get_inst_index, > .release_cmd = tcm_qla2xxx_release_cmd, > + .put_session = tcm_qla2xxx_put_session, > .shutdown_session = tcm_qla2xxx_shutdown_session, > .close_session = tcm_qla2xxx_close_session, > .sess_get_index = tcm_qla2xxx_sess_get_index, > diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c > index e797986..6b7dc67 100644 > --- a/drivers/target/target_core_transport.c > +++ b/drivers/target/target_core_transport.c > @@ -315,7 +315,7 @@ void transport_register_session( > } > EXPORT_SYMBOL(transport_register_session); > > -static void target_release_session(struct kref *kref) > +void target_release_session(struct kref *kref) > { > struct se_session *se_sess = container_of(kref, > struct se_session, sess_kref); > @@ -332,6 +332,12 @@ EXPORT_SYMBOL(target_get_session); > > void target_put_session(struct se_session *se_sess) > { > + struct se_portal_group *tpg = se_sess->se_tpg; > + > + if (tpg->se_tpg_tfo->put_session != NULL) { > + tpg->se_tpg_tfo->put_session(se_sess); > + return; > + } > kref_put(&se_sess->sess_kref, target_release_session); > } > EXPORT_SYMBOL(target_put_session); > diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h > index 4e3f7a4..2c2af2b 100644 > --- a/include/target/target_core_fabric.h > +++ b/include/target/target_core_fabric.h > @@ -47,6 +47,7 @@ struct target_core_fabric_ops { > */ > int (*check_stop_free)(struct se_cmd *); > void (*release_cmd)(struct se_cmd *); > + void (*put_session)(struct se_session *); > /* > * Called with spin_lock_bh(struct se_portal_group->session_lock held. > */ Apart from my buffer_head rant, I like it. Thanks! Jörn -- Science is the belief in the ignorance of experts. -- Richard Feynman ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] qla_target: Check refcount in find_sess_by_* 2012-05-18 1:15 ` Jörn Engel @ 2012-05-18 6:09 ` Nicholas A. Bellinger 2012-05-18 22:49 ` Jörn Engel 0 siblings, 1 reply; 13+ messages in thread From: Nicholas A. Bellinger @ 2012-05-18 6:09 UTC (permalink / raw) To: Jörn Engel Cc: Roland Dreier, target-devel, Arun Easi, linux-scsi, Steve Hodgson On Thu, 2012-05-17 at 21:15 -0400, Jörn Engel wrote: > On Thu, 17 May 2012 19:02:36 -0700, Nicholas A. Bellinger wrote: <SNIP> > > > > No need to rename iscsi-target stuff now, as it does not need > > ->put_session() or change it's current operation using > > target_put_session() for existing users.. > > Actually... I renamed the function to generic_target_put_session() so > that transports without special needs can use that as their > .put_session pointer. In principle the rename isn't needed, but the > generic versions of similar callbacks tend to be named generic_foo() > elsewhere and I like to follow common patterns. > > But more importantly, I completely forgot to add the function pointers > to all the other transports. That needs fixing... > .... > > > diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c > > > index 70c3ffb..dfc1b1b 100644 > > > --- a/drivers/target/target_core_tpg.c > > > +++ b/drivers/target/target_core_tpg.c > > > @@ -510,10 +510,10 @@ int core_tpg_del_initiator_node_acl( > > > list_del(&sess->sess_acl_list); > > > > > > rc = tpg->se_tpg_tfo->shutdown_session(sess); > > > - target_put_session(sess); > > > + tpg->se_tpg_tfo->put_session(sess); > > > if (!rc) > > > continue; > > > - target_put_session(sess); > > > + tpg->se_tpg_tfo->put_session(sess); > > > } > > > target_put_nacl(acl); > > > /* > > > > Note this breaks all the other fabric cases that don't provide a > > TFO->put_session() callback and will currently require a > > target_put_session() call. > > > > This call to TFO->put_sesssion() has been moved into > > target_put_session() the -v2 patch below.. > > ...as you also noticed. > > While your version isn't completely absurd, I learned to hate this > pattern the hard way. Incremental steps here please.. Adding individual ->put_session() callers using target_put_session() is trivial to change for everyone else once the changes to tcm_qla2xxx are working as expected.. > In filesystem code, there are several dozen > function pointers that default to buffer_head code. If the default > was to dereference a NULL pointer and get a clean backtrace, > developers of new filesystems would notice soon enough, add their own > function pointer and move on. > > But since the default is to assume buffer_heads, you now use a > filesystem-specific structure with buffer_head code and the result, > often enough, is a fairly subtle corruption that can take several days > to track down. The only thing that kept me back from removing those > defaults was fear of missing a fix for one of the fourty-odd > filesystems linux currently supports. We are fucked because we are so > deeply fucked that noone dares to unfuck the situation. > > So if you try to follow the buffer_head pattern, prepare to dodge some > rotten fruit the next few times we meet in person. I really _really_ > hate that. I has cost me several weeks of my life that I could have > spent being more productive and feeling less pain. > We currently don't have any hard requirement for obtaining a fabric specific lock in software drivers like loopback, tcm_fc, and iscsi-target, so this ends up being a NOP back into target_put_session() for handling of the default fabric independent se_sess->sess_kref put shutdown case. So while one might be swayed by your historical arguments above, I really don't see a scenario unfolding here that would unleash a hell of "no one dares to unfuck" the unfuckable as you've so eloquently stated.. > <end mad rant> > > > diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c > > index 24b4e45..5aa8372 100644 > > --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c > > +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c > > @@ -845,9 +845,28 @@ static void tcm_qla2xxx_clear_nacl_from_fcport_map(struct qla_tgt_sess *sess) > > nacl->nport_id); > > } > > > > +static void tcm_qla2xxx_release_session(struct kref *kref) > > +{ > > + struct se_session *se_sess = container_of(kref, > > + struct se_session, sess_kref); > > + > > + qlt_unreg_sess(se_sess->fabric_sess_ptr); > > +} > > + > > +static void tcm_qla2xxx_put_session(struct se_session *se_sess) > > +{ > > + struct qla_tgt_sess *sess = se_sess->fabric_sess_ptr; > > + struct qla_hw_data *ha = sess->vha->hw; > > + unsigned long flags; > > + > > + spin_lock_irqsave(&ha->hardware_lock, flags); > > + kref_put(&se_sess->sess_kref, tcm_qla2xxx_release_session); > > + spin_unlock_irqrestore(&ha->hardware_lock, flags); > > +} > > + > > static void tcm_qla2xxx_put_sess(struct qla_tgt_sess *sess) > > { > > - target_put_session(sess->se_sess); > > + tcm_qla2xxx_put_session(sess->se_sess); > > } > > > > static void tcm_qla2xxx_shutdown_sess(struct qla_tgt_sess *sess) > > @@ -1743,6 +1762,7 @@ static struct target_core_fabric_ops tcm_qla2xxx_ops = { > > .new_cmd_map = NULL, > > .check_stop_free = tcm_qla2xxx_check_stop_free, > > .release_cmd = tcm_qla2xxx_release_cmd, > > + .put_session = tcm_qla2xxx_put_session, > > .shutdown_session = tcm_qla2xxx_shutdown_session, > > .close_session = tcm_qla2xxx_close_session, > > .sess_get_index = tcm_qla2xxx_sess_get_index, > > @@ -1791,6 +1811,7 @@ static struct target_core_fabric_ops tcm_qla2xxx_npiv_ops = { > > .tpg_release_fabric_acl = tcm_qla2xxx_release_fabric_acl, > > .tpg_get_inst_index = tcm_qla2xxx_tpg_get_inst_index, > > .release_cmd = tcm_qla2xxx_release_cmd, > > + .put_session = tcm_qla2xxx_put_session, > > .shutdown_session = tcm_qla2xxx_shutdown_session, > > .close_session = tcm_qla2xxx_close_session, > > .sess_get_index = tcm_qla2xxx_sess_get_index, > > diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c > > index e797986..6b7dc67 100644 > > --- a/drivers/target/target_core_transport.c > > +++ b/drivers/target/target_core_transport.c > > @@ -315,7 +315,7 @@ void transport_register_session( > > } > > EXPORT_SYMBOL(transport_register_session); > > > > -static void target_release_session(struct kref *kref) > > +void target_release_session(struct kref *kref) > > { > > struct se_session *se_sess = container_of(kref, > > struct se_session, sess_kref); > > @@ -332,6 +332,12 @@ EXPORT_SYMBOL(target_get_session); > > > > void target_put_session(struct se_session *se_sess) > > { > > + struct se_portal_group *tpg = se_sess->se_tpg; > > + > > + if (tpg->se_tpg_tfo->put_session != NULL) { > > + tpg->se_tpg_tfo->put_session(se_sess); > > + return; > > + } > > kref_put(&se_sess->sess_kref, target_release_session); > > } > > EXPORT_SYMBOL(target_put_session); > > diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h > > index 4e3f7a4..2c2af2b 100644 > > --- a/include/target/target_core_fabric.h > > +++ b/include/target/target_core_fabric.h > > @@ -47,6 +47,7 @@ struct target_core_fabric_ops { > > */ > > int (*check_stop_free)(struct se_cmd *); > > void (*release_cmd)(struct se_cmd *); > > + void (*put_session)(struct se_session *); > > /* > > * Called with spin_lock_bh(struct se_portal_group->session_lock held. > > */ > > Apart from my buffer_head rant, I like it. Thanks! > So that means your testing with it now, right..? Thanks Joern! --nab ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] qla_target: Check refcount in find_sess_by_* 2012-05-18 6:09 ` Nicholas A. Bellinger @ 2012-05-18 22:49 ` Jörn Engel 2012-05-19 1:26 ` Nicholas A. Bellinger 0 siblings, 1 reply; 13+ messages in thread From: Jörn Engel @ 2012-05-18 22:49 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Roland Dreier, target-devel, Arun Easi, linux-scsi, Steve Hodgson On Thu, 17 May 2012 23:09:27 -0700, Nicholas A. Bellinger wrote: > > So that means your testing with it now, right..? Our kernel has diverged too much from yours to easily move patches back and forth, so no. But more importantly, we still don't have a complete patchset. Jörn -- The only real mistake is the one from which we learn nothing. -- John Powell -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] qla_target: Check refcount in find_sess_by_* 2012-05-18 22:49 ` Jörn Engel @ 2012-05-19 1:26 ` Nicholas A. Bellinger 2012-05-21 22:18 ` Nicholas A. Bellinger 0 siblings, 1 reply; 13+ messages in thread From: Nicholas A. Bellinger @ 2012-05-19 1:26 UTC (permalink / raw) To: Jörn Engel Cc: Roland Dreier, target-devel, Arun Easi, linux-scsi, Steve Hodgson On Fri, 2012-05-18 at 18:49 -0400, Jörn Engel wrote: > On Thu, 17 May 2012 23:09:27 -0700, Nicholas A. Bellinger wrote: > > > > So that means your testing with it now, right..? > > Our kernel has diverged too much from yours to easily move patches > back and forth, so no. > > But more importantly, we still don't have a complete patchset. > Ok, I've pushed into lio-core the following WIP patches based on your original patch + s_id + loop_id clearing patch from today: 6fc162d3 tcm_qla2xxx: Clear session s_id + loop_id earlier during shutdown dfebe3b5 tcm_qla2xxx: Convert to TFO->put_session() usage ec7cf009 target: Add TFO->put_session() caller for HW fabric session shutdown Today I've been testing these changes against typical active I/O during tcm_qla2xxx endpoint shutdown, and against explict NodeACL + MappedLUNs removal case. So far with active FCP of the order of ~100K IOPs with random mixed mode 4K blocks, these patches are performing session shutdown and unloading tcm_qla2xxx references as expected. Certainly these need more testing wrt to a number of special cases for active I/O shutdown, but I think they look reasonable enough to put into lio-core for now.. Please have a look and let me know if you have any problems getting it applied for testing into your .39 tree. Thanks, --nab ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] qla_target: Check refcount in find_sess_by_* 2012-05-19 1:26 ` Nicholas A. Bellinger @ 2012-05-21 22:18 ` Nicholas A. Bellinger 0 siblings, 0 replies; 13+ messages in thread From: Nicholas A. Bellinger @ 2012-05-21 22:18 UTC (permalink / raw) To: Jörn Engel Cc: Roland Dreier, target-devel, Arun Easi, linux-scsi, Steve Hodgson, Chad Dupuis On Fri, 2012-05-18 at 18:26 -0700, Nicholas A. Bellinger wrote: > On Fri, 2012-05-18 at 18:49 -0400, Jörn Engel wrote: > > On Thu, 17 May 2012 23:09:27 -0700, Nicholas A. Bellinger wrote: > > > > > > So that means your testing with it now, right..? > > > > Our kernel has diverged too much from yours to easily move patches > > back and forth, so no. > > > > But more importantly, we still don't have a complete patchset. > > > > Ok, I've pushed into lio-core the following WIP patches based on your > original patch + s_id + loop_id clearing patch from today: > > 6fc162d3 tcm_qla2xxx: Clear session s_id + loop_id earlier during shutdown > dfebe3b5 tcm_qla2xxx: Convert to TFO->put_session() usage > ec7cf009 target: Add TFO->put_session() caller for HW fabric session shutdown > > Today I've been testing these changes against typical active I/O during > tcm_qla2xxx endpoint shutdown, and against explict NodeACL + MappedLUNs > removal case. So far with active FCP of the order of ~100K IOPs with > random mixed mode 4K blocks, these patches are performing session > shutdown and unloading tcm_qla2xxx references as expected. > > Certainly these need more testing wrt to a number of special cases for > active I/O shutdown, but I think they look reasonable enough to put into > lio-core for now.. > > Please have a look and let me know if you have any problems getting it > applied for testing into your .39 tree. > Quick update here folks, So these three patches have been running over the weekend with active I/O shutdown using the same ~100K IOPs fio randrw mixed load with explicit NodeACL + MappedLUN=0 removal + re-add using qla2xxx rtslib scripts. During 5K test iterations with FC clients pushing fio randrw traffic, thus far there have been no OOPsen or outstanding I/O shutdown hangs during the explicit NodeACL + MappedLUN=0 removal ops.. After the test completed and tcm_qla2xxx was released, while unloading the qla2xxx LLD the following qla_tgt_cmd_cachep leakage warnings appear: [244560.843319] qla2xxx [0000:07:00.1]-4801:15: DPC handler waking up. [244560.850509] qla2xxx [0000:07:00.1]-4802:15: dpc_flags=0x201250. [244560.858036] qla2xxx [0000:07:00.1]-0121:15: Failed to enable receiving of RSCN requests: 0x2. [244560.867640] qla2xxx [0000:07:00.1]-480f:15: Loop resync scheduled. [244560.875896] qla2xxx [0000:07:00.1]-8837:15: F/W Ready - OK. [244560.882222] qla2xxx [0000:07:00.1]-883a:15: fw_state=3 (3, 63eb, 2, 0) curr time=103a47b53. [244560.894437] qla2xxx [0000:07:00.1]-4810:15: Loop resync end. [244560.900844] qla2xxx [0000:07:00.1]-4800:15: DPC handler sleeping. [244561.031733] qla2xxx [0000:07:00.1]-e802:15: tgt ffff8802646bc800, empty(sess_list)=1 sess_count=0 [244561.063875] qla2xxx [0000:07:00.1]-f80b:15: Waiting for 0 IRQ commands to complete (tgt ffff8802646bc800) [244561.074469] qla2xxx [0000:07:00.1]-f80c:15: Stop of tgt ffff8802646bc800 finished [244561.099207] sd 45:0:1:0: alua: Detached [244561.110278] sd 44:0:1:0: alua: Detached [244576.123716] ============================================================================= [244576.132936] BUG qla_tgt_cmd_cachep (Tainted: G O): Objects remaining on kmem_cache_close() [244576.143214] ----------------------------------------------------------------------------- [244576.143215] [244576.154181] INFO: Slab 0xffffea000420ee00 objects=25 used=1 fp=0xffff8801083bf740 flags=0x8000000000004080 [244576.165042] Pid: 22707, comm: rmmod Tainted: G O 3.4.0-rc2+ #65 [244576.172706] Call Trace: [244576.175525] [<ffffffff810c1e07>] slab_err+0x90/0x9e [244576.181159] [<ffffffff8105d901>] ? trace_hardirqs_on+0xd/0xf [244576.187667] [<ffffffff810abc10>] ? free_percpu+0x2c/0x112 [244576.193879] [<ffffffff810c6403>] kmem_cache_destroy+0x152/0x309 [244576.200674] [<ffffffff81099ae1>] ? mempool_destroy+0x43/0x47 [244576.207185] [<ffffffffa03a3782>] qlt_exit+0x3d/0x3f [qla2xxx] [244576.213790] [<ffffffffa03aa8d5>] qla2x00_module_exit+0x79/0xa6 [qla2xxx] [244576.221456] [<ffffffff8106a4bb>] sys_delete_module+0x1fb/0x25f [244576.228154] [<ffffffff811a1404>] ? lockdep_sys_exit_thunk+0x35/0x67 [244576.235337] [<ffffffff81377279>] system_call_fastpath+0x16/0x1b [244576.242134] INFO: Object 0xffff8801083ba2c8 @offset=8904 [244576.248150] ============================================================================= So I think we might (finally) have the target_wait_for_sess_cmds() hang addressed for tcm_qla2xxx with explicit NodeACL shutdown, but are still leaking descriptor memory in some qla_target.c exception path.. I'm now trying to isolate down the load to reproduce this leakage, and figure out why these descriptors are not being included in the ->sess_wait_list used by wait_for_sess_cmds... --nab ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] qla_target: Check refcount in find_sess_by_* 2012-05-17 20:49 ` Roland Dreier 2012-05-17 19:28 ` Jörn Engel @ 2012-05-17 21:24 ` Nicholas A. Bellinger 2012-05-17 19:56 ` Jörn Engel 1 sibling, 1 reply; 13+ messages in thread From: Nicholas A. Bellinger @ 2012-05-17 21:24 UTC (permalink / raw) To: Roland Dreier; +Cc: Jörn Engel, target-devel, Arun Easi, linux-scsi On Thu, 2012-05-17 at 13:49 -0700, Roland Dreier wrote: > > Mmmmm, I'd still like to try to avoid (another) atomic op in the fast > > path w/ hardware_lock held if at all possible.. > > atomic_read() is not an atomic op. > I know. This was referring to Joern's comment in the code: * Proper fix would be to atomically drop the refcount and remove the * session from all data structures that would allow lookup. Until * then this tasteless hack shall suffice. > However this patch is not right, see below. > > > Also thinking more about the case in your comment, I'm starting to > > wonder if tcm_qla2xxx_free_session() should be obtaining hardware_lock + > > clearing the s_id and loop_id lookup entries earlier than currently > > done.. > > > > That is, target_wait_for_sess_cmds() is called before clearing the > > nodeacl pointers from s_id + loop_id in tcm_qla2xxx_free_session(), > > which means that we still perform se_node_acl lookups in the case where > > the individual tcm_qla2xxx endpoint was not also being shutdown as > > well.. > > > > So that said, I think your work-around is OK to address the current BUG, > > but the s_id + loop_id clearing order in tcm_qla2xxx_free_session() > > would be having an effect here as well.. > > Yes, in fact we need to clear the session from all lookup tables > even earlier than ->free_session. We need to make sure that as > soon as target_release_session() is called, that session will no > longer be used for any new commands. > > So qla_tgt_unreg_sess() needs to clear the session from the > "by loop_id" table as well as the "by fc_id" table. > Yes, I think it should be fine to move the loop_id and by fc_id lookup removal after the vha->hw->tgt.tgt_ops->clear_nacl_from_fcport_map() call within the call to qlt_unreg_sess() w/ hardware_lock held.. This will require that find_sess_by_* be added to vha->hw->tgt.tgt_ops, and currently needs to avoid tcm_qla2xxx.h structure usage in function parameters to avoid circular include dependencies.. I'll take a look at making this change to test, and post a patch in a bit.. --nab ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] qla_target: Check refcount in find_sess_by_* 2012-05-17 21:24 ` Nicholas A. Bellinger @ 2012-05-17 19:56 ` Jörn Engel 0 siblings, 0 replies; 13+ messages in thread From: Jörn Engel @ 2012-05-17 19:56 UTC (permalink / raw) To: Nicholas A. Bellinger; +Cc: Roland Dreier, target-devel, Arun Easi, linux-scsi On Thu, 17 May 2012 14:24:25 -0700, Nicholas A. Bellinger wrote: > > Yes, I think it should be fine to move the loop_id and by fc_id lookup > removal after the vha->hw->tgt.tgt_ops->clear_nacl_from_fcport_map() > call within the call to qlt_unreg_sess() w/ hardware_lock held.. Just in case it wasn't clear from my other mail: no, that will not be fine. Once the refcount drops to zero it must be impossible to hold or obtain any new references. My patch didn't completely plug this hole and neither does your proposal. If that means the fast path just got slower: tough luck! Anyone can write fast incorrect code. If the only way to write correct code is to be slow, we shall be slow. Jörn -- Doubt is not a pleasant condition, but certainty is an absurd one. -- Voltaire ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2012-05-21 22:18 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20120511143341.GA18753@logfs.org>
[not found] ` <20120517162600.GA18067@logfs.org>
2012-05-17 19:46 ` [PATCH] qla_target: Check refcount in find_sess_by_* Nicholas A. Bellinger
2012-05-17 20:49 ` Roland Dreier
2012-05-17 19:28 ` Jörn Engel
2012-05-17 21:05 ` Jörn Engel
2012-05-17 21:10 ` Jörn Engel
2012-05-18 2:02 ` Nicholas A. Bellinger
2012-05-18 1:15 ` Jörn Engel
2012-05-18 6:09 ` Nicholas A. Bellinger
2012-05-18 22:49 ` Jörn Engel
2012-05-19 1:26 ` Nicholas A. Bellinger
2012-05-21 22:18 ` Nicholas A. Bellinger
2012-05-17 21:24 ` Nicholas A. Bellinger
2012-05-17 19:56 ` Jörn Engel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.