* [ofa-general][PATCH] mlx4_core: Synch catastrophic flow with module unload
@ 2009-07-13 15:27 Yevgeny Petrilin
2009-07-13 18:14 ` David Miller
0 siblings, 1 reply; 5+ messages in thread
From: Yevgeny Petrilin @ 2009-07-13 15:27 UTC (permalink / raw)
To: Roland Dreier; +Cc: netdev, general
There is a race condition when the mlx4_core module is being unloaded
during the execution of restart task due to catastrophic error.
Added a global mutex that synchs those operations. If the catastrophic task
tries to catch the mutex, and it is already taken, it means that somebody is unloading the
module, and there is no point in executing the restart operation.
If the unload function tries to catch the mutex and it is taken,
it would wait for the catas task to finish and then unload the module.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
drivers/net/mlx4/catas.c | 4 ++++
drivers/net/mlx4/main.c | 6 ++++++
drivers/net/mlx4/mlx4.h | 2 ++
3 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/drivers/net/mlx4/catas.c b/drivers/net/mlx4/catas.c
index aa9674b..e3aa7e9 100644
--- a/drivers/net/mlx4/catas.c
+++ b/drivers/net/mlx4/catas.c
@@ -91,6 +91,9 @@ static void catas_reset(struct work_struct *work)
LIST_HEAD(tlist);
int ret;
+ if (!mutex_trylock(&drv_mutex))
+ return;
+
spin_lock_irq(&catas_lock);
list_splice_init(&catas_list, &tlist);
spin_unlock_irq(&catas_lock);
@@ -103,6 +106,7 @@ static void catas_reset(struct work_struct *work)
else
mlx4_dbg(dev, "Reset succeeded\n");
}
+ mutex_unlock(&drv_mutex);
}
void mlx4_start_catas_poll(struct mlx4_dev *dev)
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index dac621b..9cd5123 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -77,6 +77,8 @@ static char mlx4_version[] __devinitdata =
DRV_NAME ": Mellanox ConnectX core driver v"
DRV_VERSION " (" DRV_RELDATE ")\n";
+struct mutex drv_mutex;
+
static struct mlx4_profile default_profile = {
.num_qp = 1 << 17,
.num_srq = 1 << 16,
@@ -1325,6 +1327,8 @@ static int __init mlx4_init(void)
{
int ret;
+ mutex_init(&drv_mutex);
+
if (mlx4_verify_params())
return -EINVAL;
@@ -1340,7 +1344,9 @@ static int __init mlx4_init(void)
static void __exit mlx4_cleanup(void)
{
+ mutex_lock(&drv_mutex);
pci_unregister_driver(&mlx4_driver);
+ mutex_unlock(&drv_mutex);
destroy_workqueue(mlx4_wq);
}
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 5bd79c2..bd8fb43 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -284,6 +284,8 @@ struct mlx4_sense {
struct delayed_work sense_poll;
};
+extern struct mutex drv_mutex;
+
struct mlx4_priv {
struct mlx4_dev dev;
--
1.6.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [ofa-general][PATCH] mlx4_core: Synch catastrophic flow with module unload
2009-07-13 15:27 [ofa-general][PATCH] mlx4_core: Synch catastrophic flow with module unload Yevgeny Petrilin
@ 2009-07-13 18:14 ` David Miller
2009-07-13 19:45 ` Roland Dreier
0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2009-07-13 18:14 UTC (permalink / raw)
To: yevgenyp; +Cc: rdreier, general, netdev
From: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Date: Mon, 13 Jul 2009 18:27:48 +0300
> There is a race condition when the mlx4_core module is being unloaded
> during the execution of restart task due to catastrophic error.
> Added a global mutex that synchs those operations. If the catastrophic task
> tries to catch the mutex, and it is already taken, it means that somebody is unloading the
> module, and there is no point in executing the restart operation.
> If the unload function tries to catch the mutex and it is taken,
> it would wait for the catas task to finish and then unload the module.
>
> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Applied, thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [ofa-general][PATCH] mlx4_core: Synch catastrophic flow with module unload
2009-07-13 18:14 ` David Miller
@ 2009-07-13 19:45 ` Roland Dreier
2009-07-13 20:09 ` David Miller
0 siblings, 1 reply; 5+ messages in thread
From: Roland Dreier @ 2009-07-13 19:45 UTC (permalink / raw)
To: David Miller; +Cc: yevgenyp, general, netdev
> Applied, thanks.
Dave, please don't apply mlx4_core patches without giving me a chance to
review them. In this case the patch looks buggy to me: I don't see how
it handles, say, hot remove of one device -- it only handles module
removal. And I would hope we could fix this without adding a global
symbol as namespace polluting as "drv_mutex".
Yevgeny didn't even send this patch to you; he just cc'ed netdev as a
courtesy. However I understand that the physical location of mlx4_core
in drivers/net makes it easy to do this. Maybe this is the best
argument in favor of moving the mlx4_core stuff to drivers/shared?
- R.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [ofa-general][PATCH] mlx4_core: Synch catastrophic flow with module unload
2009-07-13 19:45 ` Roland Dreier
@ 2009-07-13 20:09 ` David Miller
2009-07-13 21:53 ` Roland Dreier
0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2009-07-13 20:09 UTC (permalink / raw)
To: rdreier; +Cc: netdev, general
From: Roland Dreier <rdreier@cisco.com>
Date: Mon, 13 Jul 2009 12:45:32 -0700
> > Applied, thanks.
>
> Dave, please don't apply mlx4_core patches without giving me a chance to
> review them. In this case the patch looks buggy to me: I don't see how
> it handles, say, hot remove of one device -- it only handles module
> removal. And I would hope we could fix this without adding a global
> symbol as namespace polluting as "drv_mutex".
>
> Yevgeny didn't even send this patch to you; he just cc'ed netdev as a
> courtesy. However I understand that the physical location of mlx4_core
> in drivers/net makes it easy to do this. Maybe this is the best
> argument in favor of moving the mlx4_core stuff to drivers/shared?
If it gets sent to netdev, it's for a networking driver, and it says
"PATCH" rather than "RFC" or "please review" or "don't apply" you
cannot reasonably expect me to not look into applying the thing.
And if you're saying that patches for this device should start not
going through me, and the tactic to accomplish that is to move the
bulk of the driver into some driver/shared area, that's really weird.
Anyways I didn't push the patch out to kernel.org yet so it's easy for
me to remove it.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [ofa-general][PATCH] mlx4_core: Synch catastrophic flow with module unload
2009-07-13 20:09 ` David Miller
@ 2009-07-13 21:53 ` Roland Dreier
0 siblings, 0 replies; 5+ messages in thread
From: Roland Dreier @ 2009-07-13 21:53 UTC (permalink / raw)
To: David Miller; +Cc: netdev, general
> If it gets sent to netdev, it's for a networking driver, and it says
> "PATCH" rather than "RFC" or "please review" or "don't apply" you
> cannot reasonably expect me to not look into applying the thing.
> And if you're saying that patches for this device should start not
> going through me, and the tactic to accomplish that is to move the
> bulk of the driver into some driver/shared area, that's really weird.
Well, first of all, if a driver, networking or not, has an active
maintainer, I would expect you to give that maintainer a chance to look
at any not-totally-trivial patches affecting that driver.
But in this case, mlx4_core (as opposed to mlx4_en from the same
drivers/net/mlx4 directory) really is not a network driver -- it is a
low-level multiplexer for access to hardware that really is more
InfiniBand than ethernet (with a dash of Fibre Channel thrown in). And
yes, I am saying that making it clearer that mlx4_core is not an
network driver by moving the source to a more appropriate place does
seem to make sense.
> Anyways I didn't push the patch out to kernel.org yet so it's easy for
> me to remove it.
Thanks.
- R.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-07-13 21:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-13 15:27 [ofa-general][PATCH] mlx4_core: Synch catastrophic flow with module unload Yevgeny Petrilin
2009-07-13 18:14 ` David Miller
2009-07-13 19:45 ` Roland Dreier
2009-07-13 20:09 ` David Miller
2009-07-13 21:53 ` Roland Dreier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).