* [PATCH 1/3] vt: fix check for system/busy console drivers when unregistering them
@ 2014-12-12 16:38 Imre Deak
2014-12-12 16:38 ` [PATCH 2/3] vt: fix locking around vt_bind/vt_unbind Imre Deak
` (3 more replies)
0 siblings, 4 replies; 17+ messages in thread
From: Imre Deak @ 2014-12-12 16:38 UTC (permalink / raw)
To: Greg Kroah-Hartman, Jiri Slaby; +Cc: linux-kernel
System console drivers (without the CON_DRIVER_FLAG_MODULE flag) and
busy drivers bound to a console (as reported by con_is_bound())
shouldn't be unregistered. The current code checks for the
CON_DRIVER_FLAG_INIT flag but this doesn't really correspond to either
of the above two conditions. CON_DRIVER_FLAG_INIT is set whenever its
associated console's con_startup() function is called, which first
happens when the console driver is registered (so before the console
gets bound) and gets cleared when the console gets unbound. The
purpose of this flag is to show if we need to call con_startup() on a
console before we use it.
Based on the above, do_unregister_con_driver() in its current form will
incorrectly allow unregistering a console driver only if it was never
bound, but will refuse to unregister one that was bound and later
unbound. It will also allow unregistering a system console driver.
Fix this by checking for CON_DRIVER_FLAG_MODULE to refuse unregistering
a system console driver and relying on the existing con_is_bound() check
earlier in the function to refuse unregistering a busy console driver.
Signed-off-by: Imre Deak <imre.deak@intel.com>
---
drivers/tty/vt/vt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index b33b00b..1862e89 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -3660,7 +3660,7 @@ int do_unregister_con_driver(const struct consw *csw)
struct con_driver *con_driver = ®istered_con_driver[i];
if (con_driver->con == csw &&
- con_driver->flag & CON_DRIVER_FLAG_INIT) {
+ con_driver->flag & CON_DRIVER_FLAG_MODULE) {
vtconsole_deinit_device(con_driver);
device_destroy(vtconsole_class,
MKDEV(0, con_driver->node));
--
1.8.4
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH 2/3] vt: fix locking around vt_bind/vt_unbind 2014-12-12 16:38 [PATCH 1/3] vt: fix check for system/busy console drivers when unregistering them Imre Deak @ 2014-12-12 16:38 ` Imre Deak 2014-12-12 17:53 ` Peter Hurley 2014-12-12 16:38 ` [PATCH 3/3] vt: fix console lock vs. kernfs s_active lock order Imre Deak ` (2 subsequent siblings) 3 siblings, 1 reply; 17+ messages in thread From: Imre Deak @ 2014-12-12 16:38 UTC (permalink / raw) To: Greg Kroah-Hartman, Jiri Slaby; +Cc: linux-kernel Currently vt_bind and vt_unbind access at least the con_driver object and registered_con_driver array without holding the console lock. Fix this by locking around the whole function in each case. Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/tty/vt/vt.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 1862e89..5dd1880 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -3312,11 +3312,8 @@ static int vt_bind(struct con_driver *con) if (first == 0 && last == MAX_NR_CONSOLES -1) deflt = 1; - if (first != -1) { - console_lock(); + if (first != -1) do_bind_con_driver(csw, first, last, deflt); - console_unlock(); - } first = -1; last = -1; @@ -3356,9 +3353,7 @@ static int vt_unbind(struct con_driver *con) deflt = 1; if (first != -1) { - console_lock(); ret = do_unbind_con_driver(csw, first, last, deflt); - console_unlock(); if (ret != 0) return ret; } @@ -3388,11 +3383,15 @@ static ssize_t store_bind(struct device *dev, struct device_attribute *attr, struct con_driver *con = dev_get_drvdata(dev); int bind = simple_strtoul(buf, NULL, 0); + console_lock(); + if (bind) vt_bind(con); else vt_unbind(con); + console_unlock(); + return count; } -- 1.8.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/3] vt: fix locking around vt_bind/vt_unbind 2014-12-12 16:38 ` [PATCH 2/3] vt: fix locking around vt_bind/vt_unbind Imre Deak @ 2014-12-12 17:53 ` Peter Hurley 0 siblings, 0 replies; 17+ messages in thread From: Peter Hurley @ 2014-12-12 17:53 UTC (permalink / raw) To: Imre Deak, Greg Kroah-Hartman, Jiri Slaby; +Cc: linux-kernel On 12/12/2014 11:38 AM, Imre Deak wrote: > Currently vt_bind and vt_unbind access at least the con_driver object > and registered_con_driver array without holding the console lock. Fix > this by locking around the whole function in each case. Reviewed-by: Peter Hurley <peter@hurleysoftware.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 3/3] vt: fix console lock vs. kernfs s_active lock order 2014-12-12 16:38 [PATCH 1/3] vt: fix check for system/busy console drivers when unregistering them Imre Deak 2014-12-12 16:38 ` [PATCH 2/3] vt: fix locking around vt_bind/vt_unbind Imre Deak @ 2014-12-12 16:38 ` Imre Deak 2014-12-12 18:32 ` Peter Hurley 2014-12-13 21:14 ` [PATCH v2 " Imre Deak 2014-12-12 17:30 ` [PATCH 1/3] vt: fix check for system/busy console drivers when unregistering them Peter Hurley 2014-12-13 21:14 ` [PATCH v2 " Imre Deak 3 siblings, 2 replies; 17+ messages in thread From: Imre Deak @ 2014-12-12 16:38 UTC (permalink / raw) To: Greg Kroah-Hartman, Jiri Slaby; +Cc: linux-kernel Currently there is a lock order problem between the console lock and the kernfs s_active lock of the console driver's bind sysfs entry. When writing to the sysfs entry the lock order is first s_active then console lock, when unregistering the console driver via do_unregister_con_driver() the order is the opposite. See the below bugzilla reference for one instance of a lockdep backtrace. Fix this by unregistering the console driver from a deferred work, where we can safely drop the console lock while unregistering the device and corresponding sysfs entries (which in turn acquire s_active). Note that we have to keep the console driver slot in the registered_con_driver array reserved for the driver that's being unregistered until it's fully removed. Otherwise a concurrent call to do_register_con_driver could try to reuse the same slot and fail when registering the corresponding device with a minor index that's still in use. Reference: https://bugs.freedesktop.org/show_bug.cgi?id=70523 Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/tty/vt/vt.c | 51 +++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 41 insertions(+), 10 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 5dd1880..b9edc77 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -108,6 +108,7 @@ #define CON_DRIVER_FLAG_MODULE 1 #define CON_DRIVER_FLAG_INIT 2 #define CON_DRIVER_FLAG_ATTR 4 +#define CON_DRIVER_FLAG_ZOMBIE 8 struct con_driver { const struct consw *con; @@ -153,6 +154,7 @@ static int set_vesa_blanking(char __user *p); static void set_cursor(struct vc_data *vc); static void hide_cursor(struct vc_data *vc); static void console_callback(struct work_struct *ignored); +static void con_driver_unregister_callback(struct work_struct *ignored); static void blank_screen_t(unsigned long dummy); static void set_palette(struct vc_data *vc); @@ -180,6 +182,7 @@ static int blankinterval = 10*60; core_param(consoleblank, blankinterval, int, 0444); static DECLARE_WORK(console_work, console_callback); +static DECLARE_WORK(con_driver_unregister_work, con_driver_unregister_callback); /* * fg_console is the current virtual console, @@ -3597,7 +3600,8 @@ static int do_register_con_driver(const struct consw *csw, int first, int last) for (i = 0; i < MAX_NR_CON_DRIVER; i++) { con_driver = ®istered_con_driver[i]; - if (con_driver->con == NULL) { + if (con_driver->con == NULL && + !(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) { con_driver->con = csw; con_driver->desc = desc; con_driver->node = i; @@ -3660,16 +3664,10 @@ int do_unregister_con_driver(const struct consw *csw) if (con_driver->con == csw && con_driver->flag & CON_DRIVER_FLAG_MODULE) { - vtconsole_deinit_device(con_driver); - device_destroy(vtconsole_class, - MKDEV(0, con_driver->node)); con_driver->con = NULL; - con_driver->desc = NULL; - con_driver->dev = NULL; - con_driver->node = 0; - con_driver->flag = 0; - con_driver->first = 0; - con_driver->last = 0; + con_driver->flag = CON_DRIVER_FLAG_ZOMBIE; + schedule_work(&con_driver_unregister_work); + return 0; } } @@ -3678,6 +3676,39 @@ int do_unregister_con_driver(const struct consw *csw) } EXPORT_SYMBOL_GPL(do_unregister_con_driver); +static void con_driver_unregister_callback(struct work_struct *ignored) +{ + int i; + + console_lock(); + + for (i = 0; i < MAX_NR_CON_DRIVER; i++) { + struct con_driver *con_driver = ®istered_con_driver[i]; + + if (!(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) + continue; + + console_unlock(); + + vtconsole_deinit_device(con_driver); + device_destroy(vtconsole_class, MKDEV(0, con_driver->node)); + + if (WARN_ON_ONCE(con_driver->con)) + con_driver->con = NULL; + con_driver->desc = NULL; + con_driver->dev = NULL; + con_driver->node = 0; + WARN_ON_ONCE(con_driver->flag != CON_DRIVER_FLAG_ZOMBIE); + con_driver->flag = 0; + con_driver->first = 0; + con_driver->last = 0; + + console_lock(); + } + + console_unlock(); +} + /* * If we support more console drivers, this function is used * when a driver wants to take over some existing consoles -- 1.8.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 3/3] vt: fix console lock vs. kernfs s_active lock order 2014-12-12 16:38 ` [PATCH 3/3] vt: fix console lock vs. kernfs s_active lock order Imre Deak @ 2014-12-12 18:32 ` Peter Hurley 2014-12-12 20:29 ` Imre Deak 2014-12-13 21:14 ` [PATCH v2 " Imre Deak 1 sibling, 1 reply; 17+ messages in thread From: Peter Hurley @ 2014-12-12 18:32 UTC (permalink / raw) To: Imre Deak, Greg Kroah-Hartman, Jiri Slaby; +Cc: linux-kernel Hi Imre, On 12/12/2014 11:38 AM, Imre Deak wrote: > Currently there is a lock order problem between the console lock and the > kernfs s_active lock of the console driver's bind sysfs entry. When > writing to the sysfs entry the lock order is first s_active then console > lock, when unregistering the console driver via > do_unregister_con_driver() the order is the opposite. See the below > bugzilla reference for one instance of a lockdep backtrace. This description didn't make sense to me because the driver core doesn't try to take the console_lock. So I had to go pull the lockdep report and I'm not sure I agree with your analysis. I see a three-way dependency which includes the fb atomic notifier call chain? Regards, Peter Hurley > Fix this by unregistering the console driver from a deferred work, where > we can safely drop the console lock while unregistering the device and > corresponding sysfs entries (which in turn acquire s_active). Note that > we have to keep the console driver slot in the registered_con_driver > array reserved for the driver that's being unregistered until it's fully > removed. Otherwise a concurrent call to do_register_con_driver could > try to reuse the same slot and fail when registering the corresponding > device with a minor index that's still in use. > > Reference: https://bugs.freedesktop.org/show_bug.cgi?id=70523 > Signed-off-by: Imre Deak <imre.deak@intel.com> > --- > drivers/tty/vt/vt.c | 51 +++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 41 insertions(+), 10 deletions(-) > > diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c > index 5dd1880..b9edc77 100644 > --- a/drivers/tty/vt/vt.c > +++ b/drivers/tty/vt/vt.c > @@ -108,6 +108,7 @@ > #define CON_DRIVER_FLAG_MODULE 1 > #define CON_DRIVER_FLAG_INIT 2 > #define CON_DRIVER_FLAG_ATTR 4 > +#define CON_DRIVER_FLAG_ZOMBIE 8 > > struct con_driver { > const struct consw *con; > @@ -153,6 +154,7 @@ static int set_vesa_blanking(char __user *p); > static void set_cursor(struct vc_data *vc); > static void hide_cursor(struct vc_data *vc); > static void console_callback(struct work_struct *ignored); > +static void con_driver_unregister_callback(struct work_struct *ignored); > static void blank_screen_t(unsigned long dummy); > static void set_palette(struct vc_data *vc); > > @@ -180,6 +182,7 @@ static int blankinterval = 10*60; > core_param(consoleblank, blankinterval, int, 0444); > > static DECLARE_WORK(console_work, console_callback); > +static DECLARE_WORK(con_driver_unregister_work, con_driver_unregister_callback); > > /* > * fg_console is the current virtual console, > @@ -3597,7 +3600,8 @@ static int do_register_con_driver(const struct consw *csw, int first, int last) > for (i = 0; i < MAX_NR_CON_DRIVER; i++) { > con_driver = ®istered_con_driver[i]; > > - if (con_driver->con == NULL) { > + if (con_driver->con == NULL && > + !(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) { > con_driver->con = csw; > con_driver->desc = desc; > con_driver->node = i; > @@ -3660,16 +3664,10 @@ int do_unregister_con_driver(const struct consw *csw) > > if (con_driver->con == csw && > con_driver->flag & CON_DRIVER_FLAG_MODULE) { > - vtconsole_deinit_device(con_driver); > - device_destroy(vtconsole_class, > - MKDEV(0, con_driver->node)); > con_driver->con = NULL; > - con_driver->desc = NULL; > - con_driver->dev = NULL; > - con_driver->node = 0; > - con_driver->flag = 0; > - con_driver->first = 0; > - con_driver->last = 0; > + con_driver->flag = CON_DRIVER_FLAG_ZOMBIE; > + schedule_work(&con_driver_unregister_work); > + > return 0; > } > } > @@ -3678,6 +3676,39 @@ int do_unregister_con_driver(const struct consw *csw) > } > EXPORT_SYMBOL_GPL(do_unregister_con_driver); > > +static void con_driver_unregister_callback(struct work_struct *ignored) > +{ > + int i; > + > + console_lock(); > + > + for (i = 0; i < MAX_NR_CON_DRIVER; i++) { > + struct con_driver *con_driver = ®istered_con_driver[i]; > + > + if (!(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) > + continue; > + > + console_unlock(); > + > + vtconsole_deinit_device(con_driver); > + device_destroy(vtconsole_class, MKDEV(0, con_driver->node)); > + > + if (WARN_ON_ONCE(con_driver->con)) > + con_driver->con = NULL; > + con_driver->desc = NULL; > + con_driver->dev = NULL; > + con_driver->node = 0; > + WARN_ON_ONCE(con_driver->flag != CON_DRIVER_FLAG_ZOMBIE); > + con_driver->flag = 0; > + con_driver->first = 0; > + con_driver->last = 0; > + > + console_lock(); > + } > + > + console_unlock(); > +} > + > /* > * If we support more console drivers, this function is used > * when a driver wants to take over some existing consoles > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/3] vt: fix console lock vs. kernfs s_active lock order 2014-12-12 18:32 ` Peter Hurley @ 2014-12-12 20:29 ` Imre Deak 2014-12-12 20:55 ` Imre Deak 2014-12-12 21:03 ` Peter Hurley 0 siblings, 2 replies; 17+ messages in thread From: Imre Deak @ 2014-12-12 20:29 UTC (permalink / raw) To: Peter Hurley; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-kernel Hi Peter, thanks for your review. On Fri, 2014-12-12 at 13:32 -0500, Peter Hurley wrote: > Hi Imre, > > On 12/12/2014 11:38 AM, Imre Deak wrote: > > Currently there is a lock order problem between the console lock and the > > kernfs s_active lock of the console driver's bind sysfs entry. When > > writing to the sysfs entry the lock order is first s_active then console > > lock, when unregistering the console driver via > > do_unregister_con_driver() the order is the opposite. See the below > > bugzilla reference for one instance of a lockdep backtrace. > > This description didn't make sense to me because the driver core doesn't > try to take the console_lock. So I had to go pull the lockdep report > and I'm not sure I agree with your analysis. > > I see a three-way dependency which includes the fb atomic notifier call > chain? >From the lockdep report in the bugzilla ticket I referenced, you can see the following two paths: i915_driver_load() console_lock() -> takes console_sem do_unregister_con_driver() vtconsole_deinit_device() device_remove_file() ... __kernfs_remove() kernfs_drain() -> takes s_active rwsem for the console's bind sysfs entry (tracked via kn->dep_map) vfs_write() for the above console bind sysfs entry kernfs_fop_write() kernfs_get_active() -> takes s_active rwsem for the above sysfs entry ... store_bind() -> takes console_sem So you have console_sem->s_active ordering on one path and s_active->console_sem ordering on the other. This patch gets rid of the ordering problem and the related lockdep warning. --Imre > > Regards, > Peter Hurley > > > Fix this by unregistering the console driver from a deferred work, where > > we can safely drop the console lock while unregistering the device and > > corresponding sysfs entries (which in turn acquire s_active). Note that > > we have to keep the console driver slot in the registered_con_driver > > array reserved for the driver that's being unregistered until it's fully > > removed. Otherwise a concurrent call to do_register_con_driver could > > try to reuse the same slot and fail when registering the corresponding > > device with a minor index that's still in use. > > > > Reference: https://bugs.freedesktop.org/show_bug.cgi?id=70523 > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > --- > > drivers/tty/vt/vt.c | 51 +++++++++++++++++++++++++++++++++++++++++---------- > > 1 file changed, 41 insertions(+), 10 deletions(-) > > > > diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c > > index 5dd1880..b9edc77 100644 > > --- a/drivers/tty/vt/vt.c > > +++ b/drivers/tty/vt/vt.c > > @@ -108,6 +108,7 @@ > > #define CON_DRIVER_FLAG_MODULE 1 > > #define CON_DRIVER_FLAG_INIT 2 > > #define CON_DRIVER_FLAG_ATTR 4 > > +#define CON_DRIVER_FLAG_ZOMBIE 8 > > > > struct con_driver { > > const struct consw *con; > > @@ -153,6 +154,7 @@ static int set_vesa_blanking(char __user *p); > > static void set_cursor(struct vc_data *vc); > > static void hide_cursor(struct vc_data *vc); > > static void console_callback(struct work_struct *ignored); > > +static void con_driver_unregister_callback(struct work_struct *ignored); > > static void blank_screen_t(unsigned long dummy); > > static void set_palette(struct vc_data *vc); > > > > @@ -180,6 +182,7 @@ static int blankinterval = 10*60; > > core_param(consoleblank, blankinterval, int, 0444); > > > > static DECLARE_WORK(console_work, console_callback); > > +static DECLARE_WORK(con_driver_unregister_work, con_driver_unregister_callback); > > > > /* > > * fg_console is the current virtual console, > > @@ -3597,7 +3600,8 @@ static int do_register_con_driver(const struct consw *csw, int first, int last) > > for (i = 0; i < MAX_NR_CON_DRIVER; i++) { > > con_driver = ®istered_con_driver[i]; > > > > - if (con_driver->con == NULL) { > > + if (con_driver->con == NULL && > > + !(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) { > > con_driver->con = csw; > > con_driver->desc = desc; > > con_driver->node = i; > > @@ -3660,16 +3664,10 @@ int do_unregister_con_driver(const struct consw *csw) > > > > if (con_driver->con == csw && > > con_driver->flag & CON_DRIVER_FLAG_MODULE) { > > - vtconsole_deinit_device(con_driver); > > - device_destroy(vtconsole_class, > > - MKDEV(0, con_driver->node)); > > con_driver->con = NULL; > > - con_driver->desc = NULL; > > - con_driver->dev = NULL; > > - con_driver->node = 0; > > - con_driver->flag = 0; > > - con_driver->first = 0; > > - con_driver->last = 0; > > + con_driver->flag = CON_DRIVER_FLAG_ZOMBIE; > > + schedule_work(&con_driver_unregister_work); > > + > > return 0; > > } > > } > > @@ -3678,6 +3676,39 @@ int do_unregister_con_driver(const struct consw *csw) > > } > > EXPORT_SYMBOL_GPL(do_unregister_con_driver); > > > > +static void con_driver_unregister_callback(struct work_struct *ignored) > > +{ > > + int i; > > + > > + console_lock(); > > + > > + for (i = 0; i < MAX_NR_CON_DRIVER; i++) { > > + struct con_driver *con_driver = ®istered_con_driver[i]; > > + > > + if (!(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) > > + continue; > > + > > + console_unlock(); > > + > > + vtconsole_deinit_device(con_driver); > > + device_destroy(vtconsole_class, MKDEV(0, con_driver->node)); > > + > > + if (WARN_ON_ONCE(con_driver->con)) > > + con_driver->con = NULL; > > + con_driver->desc = NULL; > > + con_driver->dev = NULL; > > + con_driver->node = 0; > > + WARN_ON_ONCE(con_driver->flag != CON_DRIVER_FLAG_ZOMBIE); > > + con_driver->flag = 0; > > + con_driver->first = 0; > > + con_driver->last = 0; > > + > > + console_lock(); > > + } > > + > > + console_unlock(); > > +} > > + > > /* > > * If we support more console drivers, this function is used > > * when a driver wants to take over some existing consoles > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/3] vt: fix console lock vs. kernfs s_active lock order 2014-12-12 20:29 ` Imre Deak @ 2014-12-12 20:55 ` Imre Deak 2014-12-12 21:03 ` Peter Hurley 1 sibling, 0 replies; 17+ messages in thread From: Imre Deak @ 2014-12-12 20:55 UTC (permalink / raw) To: Peter Hurley; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-kernel On Fri, 2014-12-12 at 22:29 +0200, Imre Deak wrote: > Hi Peter, > > thanks for your review. > > On Fri, 2014-12-12 at 13:32 -0500, Peter Hurley wrote: > > Hi Imre, > > > > On 12/12/2014 11:38 AM, Imre Deak wrote: > > > Currently there is a lock order problem between the console lock and the > > > kernfs s_active lock of the console driver's bind sysfs entry. When > > > writing to the sysfs entry the lock order is first s_active then console > > > lock, when unregistering the console driver via > > > do_unregister_con_driver() the order is the opposite. See the below > > > bugzilla reference for one instance of a lockdep backtrace. > > > > This description didn't make sense to me because the driver core doesn't > > try to take the console_lock. So I had to go pull the lockdep report > > and I'm not sure I agree with your analysis. > > > > I see a three-way dependency which includes the fb atomic notifier call > > chain? > > From the lockdep report in the bugzilla ticket I referenced, you can see > the following two paths: > > i915_driver_load() > console_lock() -> takes console_sem > do_unregister_con_driver() > vtconsole_deinit_device() > device_remove_file() > ... > __kernfs_remove() > kernfs_drain() -> > takes s_active rwsem for the console's bind sysfs entry > (tracked via kn->dep_map) > > > vfs_write() for the above console bind sysfs entry > kernfs_fop_write() > kernfs_get_active() -> > takes s_active rwsem for the above sysfs entry > ... > store_bind() -> takes console_sem > > So you have console_sem->s_active ordering on one path and > s_active->console_sem ordering on the other. > > This patch gets rid of the ordering problem and the related lockdep > warning. > > --Imre > > > > > Regards, > > Peter Hurley > > > > > Fix this by unregistering the console driver from a deferred work, where > > > we can safely drop the console lock while unregistering the device and > > > corresponding sysfs entries (which in turn acquire s_active). Note that > > > we have to keep the console driver slot in the registered_con_driver > > > array reserved for the driver that's being unregistered until it's fully > > > removed. Otherwise a concurrent call to do_register_con_driver could > > > try to reuse the same slot and fail when registering the corresponding > > > device with a minor index that's still in use. > > > > > > Reference: https://bugs.freedesktop.org/show_bug.cgi?id=70523 > > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > > --- > > > drivers/tty/vt/vt.c | 51 +++++++++++++++++++++++++++++++++++++++++---------- > > > 1 file changed, 41 insertions(+), 10 deletions(-) > > > > > > diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c > > > index 5dd1880..b9edc77 100644 > > > --- a/drivers/tty/vt/vt.c > > > +++ b/drivers/tty/vt/vt.c > > > @@ -108,6 +108,7 @@ > > > #define CON_DRIVER_FLAG_MODULE 1 > > > #define CON_DRIVER_FLAG_INIT 2 > > > #define CON_DRIVER_FLAG_ATTR 4 > > > +#define CON_DRIVER_FLAG_ZOMBIE 8 > > > > > > struct con_driver { > > > const struct consw *con; > > > @@ -153,6 +154,7 @@ static int set_vesa_blanking(char __user *p); > > > static void set_cursor(struct vc_data *vc); > > > static void hide_cursor(struct vc_data *vc); > > > static void console_callback(struct work_struct *ignored); > > > +static void con_driver_unregister_callback(struct work_struct *ignored); > > > static void blank_screen_t(unsigned long dummy); > > > static void set_palette(struct vc_data *vc); > > > > > > @@ -180,6 +182,7 @@ static int blankinterval = 10*60; > > > core_param(consoleblank, blankinterval, int, 0444); > > > > > > static DECLARE_WORK(console_work, console_callback); > > > +static DECLARE_WORK(con_driver_unregister_work, con_driver_unregister_callback); > > > > > > /* > > > * fg_console is the current virtual console, > > > @@ -3597,7 +3600,8 @@ static int do_register_con_driver(const struct consw *csw, int first, int last) > > > for (i = 0; i < MAX_NR_CON_DRIVER; i++) { > > > con_driver = ®istered_con_driver[i]; > > > > > > - if (con_driver->con == NULL) { > > > + if (con_driver->con == NULL && > > > + !(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) { > > > con_driver->con = csw; > > > con_driver->desc = desc; > > > con_driver->node = i; > > > @@ -3660,16 +3664,10 @@ int do_unregister_con_driver(const struct consw *csw) > > > > > > if (con_driver->con == csw && > > > con_driver->flag & CON_DRIVER_FLAG_MODULE) { > > > - vtconsole_deinit_device(con_driver); > > > - device_destroy(vtconsole_class, > > > - MKDEV(0, con_driver->node)); > > > con_driver->con = NULL; > > > - con_driver->desc = NULL; > > > - con_driver->dev = NULL; > > > - con_driver->node = 0; > > > - con_driver->flag = 0; > > > - con_driver->first = 0; > > > - con_driver->last = 0; > > > + con_driver->flag = CON_DRIVER_FLAG_ZOMBIE; > > > + schedule_work(&con_driver_unregister_work); > > > + > > > return 0; > > > } > > > } > > > @@ -3678,6 +3676,39 @@ int do_unregister_con_driver(const struct consw *csw) > > > } > > > EXPORT_SYMBOL_GPL(do_unregister_con_driver); > > > > > > +static void con_driver_unregister_callback(struct work_struct *ignored) > > > +{ > > > + int i; > > > + > > > + console_lock(); > > > + > > > + for (i = 0; i < MAX_NR_CON_DRIVER; i++) { > > > + struct con_driver *con_driver = ®istered_con_driver[i]; > > > + > > > + if (!(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) > > > + continue; > > > + > > > + console_unlock(); > > > + > > > + vtconsole_deinit_device(con_driver); > > > + device_destroy(vtconsole_class, MKDEV(0, con_driver->node)); Err, I just realized one mistake, the console_lock() call below needs to be moved here, since all the assignments below need to be protected. I'll send a v2 with this fixed. > > > + > > > + if (WARN_ON_ONCE(con_driver->con)) > > > + con_driver->con = NULL; > > > + con_driver->desc = NULL; > > > + con_driver->dev = NULL; > > > + con_driver->node = 0; > > > + WARN_ON_ONCE(con_driver->flag != CON_DRIVER_FLAG_ZOMBIE); > > > + con_driver->flag = 0; > > > + con_driver->first = 0; > > > + con_driver->last = 0; > > > + > > > + console_lock(); > > > + } > > > + > > > + console_unlock(); > > > +} > > > + > > > /* > > > * If we support more console drivers, this function is used > > > * when a driver wants to take over some existing consoles > > > > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/3] vt: fix console lock vs. kernfs s_active lock order 2014-12-12 20:29 ` Imre Deak 2014-12-12 20:55 ` Imre Deak @ 2014-12-12 21:03 ` Peter Hurley 2014-12-12 22:03 ` Imre Deak 1 sibling, 1 reply; 17+ messages in thread From: Peter Hurley @ 2014-12-12 21:03 UTC (permalink / raw) To: imre.deak; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-kernel On 12/12/2014 03:29 PM, Imre Deak wrote: > Hi Peter, > > thanks for your review. > > On Fri, 2014-12-12 at 13:32 -0500, Peter Hurley wrote: >> Hi Imre, >> >> On 12/12/2014 11:38 AM, Imre Deak wrote: >>> Currently there is a lock order problem between the console lock and the >>> kernfs s_active lock of the console driver's bind sysfs entry. When >>> writing to the sysfs entry the lock order is first s_active then console >>> lock, when unregistering the console driver via >>> do_unregister_con_driver() the order is the opposite. See the below >>> bugzilla reference for one instance of a lockdep backtrace. >> >> This description didn't make sense to me because the driver core doesn't >> try to take the console_lock. So I had to go pull the lockdep report >> and I'm not sure I agree with your analysis. >> >> I see a three-way dependency which includes the fb atomic notifier call >> chain? > > From the lockdep report in the bugzilla ticket I referenced, you can see > the following two paths: > > i915_driver_load() > console_lock() -> takes console_sem > do_unregister_con_driver() > vtconsole_deinit_device() > device_remove_file() > ... > __kernfs_remove() > kernfs_drain() -> > takes s_active rwsem for the console's bind sysfs entry > (tracked via kn->dep_map) I don't see this call chain. I see: (some obvious intermediate calls redacted) i915_driver_unload do_unregister_framebuffer fb_notifier_call_chain fbcon_event_notify do_unregister_con_driver device_remove_file sysfs_addrm_finish which has console_lock => fb notifier lock => kernfs lock dependency. My point being that this occurs only because the fb notifier call chain requires console_lock => fb notifier lock dependency ordering; that and fbcon assumes that it can do whatever in the notifier. I would like to see more thorough justification for why this change belongs in vt, and not in fbcon. That was my point about the 3-way dependency. Regards, Peter Hurley > vfs_write() for the above console bind sysfs entry > kernfs_fop_write() > kernfs_get_active() -> > takes s_active rwsem for the above sysfs entry > ... > store_bind() -> takes console_sem > > So you have console_sem->s_active ordering on one path and > s_active->console_sem ordering on the other. > > This patch gets rid of the ordering problem and the related lockdep > warning. > > --Imre > >> >> Regards, >> Peter Hurley >> >>> Fix this by unregistering the console driver from a deferred work, where >>> we can safely drop the console lock while unregistering the device and >>> corresponding sysfs entries (which in turn acquire s_active). Note that >>> we have to keep the console driver slot in the registered_con_driver >>> array reserved for the driver that's being unregistered until it's fully >>> removed. Otherwise a concurrent call to do_register_con_driver could >>> try to reuse the same slot and fail when registering the corresponding >>> device with a minor index that's still in use. >>> >>> Reference: https://bugs.freedesktop.org/show_bug.cgi?id=70523 >>> Signed-off-by: Imre Deak <imre.deak@intel.com> >>> --- >>> drivers/tty/vt/vt.c | 51 +++++++++++++++++++++++++++++++++++++++++---------- >>> 1 file changed, 41 insertions(+), 10 deletions(-) >>> >>> diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c >>> index 5dd1880..b9edc77 100644 >>> --- a/drivers/tty/vt/vt.c >>> +++ b/drivers/tty/vt/vt.c >>> @@ -108,6 +108,7 @@ >>> #define CON_DRIVER_FLAG_MODULE 1 >>> #define CON_DRIVER_FLAG_INIT 2 >>> #define CON_DRIVER_FLAG_ATTR 4 >>> +#define CON_DRIVER_FLAG_ZOMBIE 8 >>> >>> struct con_driver { >>> const struct consw *con; >>> @@ -153,6 +154,7 @@ static int set_vesa_blanking(char __user *p); >>> static void set_cursor(struct vc_data *vc); >>> static void hide_cursor(struct vc_data *vc); >>> static void console_callback(struct work_struct *ignored); >>> +static void con_driver_unregister_callback(struct work_struct *ignored); >>> static void blank_screen_t(unsigned long dummy); >>> static void set_palette(struct vc_data *vc); >>> >>> @@ -180,6 +182,7 @@ static int blankinterval = 10*60; >>> core_param(consoleblank, blankinterval, int, 0444); >>> >>> static DECLARE_WORK(console_work, console_callback); >>> +static DECLARE_WORK(con_driver_unregister_work, con_driver_unregister_callback); >>> >>> /* >>> * fg_console is the current virtual console, >>> @@ -3597,7 +3600,8 @@ static int do_register_con_driver(const struct consw *csw, int first, int last) >>> for (i = 0; i < MAX_NR_CON_DRIVER; i++) { >>> con_driver = ®istered_con_driver[i]; >>> >>> - if (con_driver->con == NULL) { >>> + if (con_driver->con == NULL && >>> + !(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) { >>> con_driver->con = csw; >>> con_driver->desc = desc; >>> con_driver->node = i; >>> @@ -3660,16 +3664,10 @@ int do_unregister_con_driver(const struct consw *csw) >>> >>> if (con_driver->con == csw && >>> con_driver->flag & CON_DRIVER_FLAG_MODULE) { >>> - vtconsole_deinit_device(con_driver); >>> - device_destroy(vtconsole_class, >>> - MKDEV(0, con_driver->node)); >>> con_driver->con = NULL; >>> - con_driver->desc = NULL; >>> - con_driver->dev = NULL; >>> - con_driver->node = 0; >>> - con_driver->flag = 0; >>> - con_driver->first = 0; >>> - con_driver->last = 0; >>> + con_driver->flag = CON_DRIVER_FLAG_ZOMBIE; >>> + schedule_work(&con_driver_unregister_work); >>> + >>> return 0; >>> } >>> } >>> @@ -3678,6 +3676,39 @@ int do_unregister_con_driver(const struct consw *csw) >>> } >>> EXPORT_SYMBOL_GPL(do_unregister_con_driver); >>> >>> +static void con_driver_unregister_callback(struct work_struct *ignored) >>> +{ >>> + int i; >>> + >>> + console_lock(); >>> + >>> + for (i = 0; i < MAX_NR_CON_DRIVER; i++) { >>> + struct con_driver *con_driver = ®istered_con_driver[i]; >>> + >>> + if (!(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) >>> + continue; >>> + >>> + console_unlock(); >>> + >>> + vtconsole_deinit_device(con_driver); >>> + device_destroy(vtconsole_class, MKDEV(0, con_driver->node)); >>> + >>> + if (WARN_ON_ONCE(con_driver->con)) >>> + con_driver->con = NULL; >>> + con_driver->desc = NULL; >>> + con_driver->dev = NULL; >>> + con_driver->node = 0; >>> + WARN_ON_ONCE(con_driver->flag != CON_DRIVER_FLAG_ZOMBIE); >>> + con_driver->flag = 0; >>> + con_driver->first = 0; >>> + con_driver->last = 0; >>> + >>> + console_lock(); >>> + } >>> + >>> + console_unlock(); >>> +} >>> + >>> /* >>> * If we support more console drivers, this function is used >>> * when a driver wants to take over some existing consoles >>> >> > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/3] vt: fix console lock vs. kernfs s_active lock order 2014-12-12 21:03 ` Peter Hurley @ 2014-12-12 22:03 ` Imre Deak 2014-12-12 22:45 ` Peter Hurley 0 siblings, 1 reply; 17+ messages in thread From: Imre Deak @ 2014-12-12 22:03 UTC (permalink / raw) To: Peter Hurley; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-kernel On Fri, 2014-12-12 at 16:03 -0500, Peter Hurley wrote: > On 12/12/2014 03:29 PM, Imre Deak wrote: > > Hi Peter, > > > > thanks for your review. > > > > On Fri, 2014-12-12 at 13:32 -0500, Peter Hurley wrote: > >> Hi Imre, > >> > >> On 12/12/2014 11:38 AM, Imre Deak wrote: > >>> Currently there is a lock order problem between the console lock and the > >>> kernfs s_active lock of the console driver's bind sysfs entry. When > >>> writing to the sysfs entry the lock order is first s_active then console > >>> lock, when unregistering the console driver via > >>> do_unregister_con_driver() the order is the opposite. See the below > >>> bugzilla reference for one instance of a lockdep backtrace. > >> > >> This description didn't make sense to me because the driver core doesn't > >> try to take the console_lock. So I had to go pull the lockdep report > >> and I'm not sure I agree with your analysis. > >> > >> I see a three-way dependency which includes the fb atomic notifier call > >> chain? > > > > From the lockdep report in the bugzilla ticket I referenced, you can see > > the following two paths: > > > > i915_driver_load() > > console_lock() -> takes console_sem > > do_unregister_con_driver() > > vtconsole_deinit_device() > > device_remove_file() > > ... > > __kernfs_remove() > > kernfs_drain() -> > > takes s_active rwsem for the console's bind sysfs entry > > (tracked via kn->dep_map) > > I don't see this call chain. > > I see: (some obvious intermediate calls redacted) > > i915_driver_unload > do_unregister_framebuffer > fb_notifier_call_chain > fbcon_event_notify > do_unregister_con_driver > device_remove_file > sysfs_addrm_finish > > which has console_lock => fb notifier lock => kernfs lock dependency. Ah, right, there are two dmesg logs in the bug report, your sequence is the first one [1] and I talked about the second [2]. [1] https://bugs.freedesktop.org/attachment.cgi?id=87716 [2] https://bugs.freedesktop.org/attachment.cgi?id=107602 > > My point being that this occurs only because the fb notifier call chain > requires console_lock => fb notifier lock dependency ordering; that and > fbcon assumes that it can do whatever in the notifier. Yes this is correct and I don't see any place where we would have the opposite order. The lockdep report in [1] is really just the same problem I described which is the s_active->console_lock dependency: CPU0 CPU1 ---- ---- lock((fb_notifier_list).rwsem); lock(console_lock); lock((fb_notifier_list).rwsem); lock(s_active#114); *** DEADLOCK *** This can lead to a deadlock only because s_active already depends on console_lock. After removing this dependency I can't see any other way these locks could deadlock. > I would like to see more thorough justification for why this change > belongs in vt, and not in fbcon. That was my point about the 3-way > dependency. Since it's really about the dependency between console_lock and s_active and fbcon is not involved in all code paths where we take these locks (like in the [2] case) but vt is. --Imre > > Regards, > Peter Hurley > > > > vfs_write() for the above console bind sysfs entry > > kernfs_fop_write() > > kernfs_get_active() -> > > takes s_active rwsem for the above sysfs entry > > ... > > store_bind() -> takes console_sem > > > > So you have console_sem->s_active ordering on one path and > > s_active->console_sem ordering on the other. > > > > This patch gets rid of the ordering problem and the related lockdep > > warning. > > > > --Imre > > > >> > >> Regards, > >> Peter Hurley > >> > >>> Fix this by unregistering the console driver from a deferred work, where > >>> we can safely drop the console lock while unregistering the device and > >>> corresponding sysfs entries (which in turn acquire s_active). Note that > >>> we have to keep the console driver slot in the registered_con_driver > >>> array reserved for the driver that's being unregistered until it's fully > >>> removed. Otherwise a concurrent call to do_register_con_driver could > >>> try to reuse the same slot and fail when registering the corresponding > >>> device with a minor index that's still in use. > >>> > >>> Reference: https://bugs.freedesktop.org/show_bug.cgi?id=70523 > >>> Signed-off-by: Imre Deak <imre.deak@intel.com> > >>> --- > >>> drivers/tty/vt/vt.c | 51 +++++++++++++++++++++++++++++++++++++++++---------- > >>> 1 file changed, 41 insertions(+), 10 deletions(-) > >>> > >>> diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c > >>> index 5dd1880..b9edc77 100644 > >>> --- a/drivers/tty/vt/vt.c > >>> +++ b/drivers/tty/vt/vt.c > >>> @@ -108,6 +108,7 @@ > >>> #define CON_DRIVER_FLAG_MODULE 1 > >>> #define CON_DRIVER_FLAG_INIT 2 > >>> #define CON_DRIVER_FLAG_ATTR 4 > >>> +#define CON_DRIVER_FLAG_ZOMBIE 8 > >>> > >>> struct con_driver { > >>> const struct consw *con; > >>> @@ -153,6 +154,7 @@ static int set_vesa_blanking(char __user *p); > >>> static void set_cursor(struct vc_data *vc); > >>> static void hide_cursor(struct vc_data *vc); > >>> static void console_callback(struct work_struct *ignored); > >>> +static void con_driver_unregister_callback(struct work_struct *ignored); > >>> static void blank_screen_t(unsigned long dummy); > >>> static void set_palette(struct vc_data *vc); > >>> > >>> @@ -180,6 +182,7 @@ static int blankinterval = 10*60; > >>> core_param(consoleblank, blankinterval, int, 0444); > >>> > >>> static DECLARE_WORK(console_work, console_callback); > >>> +static DECLARE_WORK(con_driver_unregister_work, con_driver_unregister_callback); > >>> > >>> /* > >>> * fg_console is the current virtual console, > >>> @@ -3597,7 +3600,8 @@ static int do_register_con_driver(const struct consw *csw, int first, int last) > >>> for (i = 0; i < MAX_NR_CON_DRIVER; i++) { > >>> con_driver = ®istered_con_driver[i]; > >>> > >>> - if (con_driver->con == NULL) { > >>> + if (con_driver->con == NULL && > >>> + !(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) { > >>> con_driver->con = csw; > >>> con_driver->desc = desc; > >>> con_driver->node = i; > >>> @@ -3660,16 +3664,10 @@ int do_unregister_con_driver(const struct consw *csw) > >>> > >>> if (con_driver->con == csw && > >>> con_driver->flag & CON_DRIVER_FLAG_MODULE) { > >>> - vtconsole_deinit_device(con_driver); > >>> - device_destroy(vtconsole_class, > >>> - MKDEV(0, con_driver->node)); > >>> con_driver->con = NULL; > >>> - con_driver->desc = NULL; > >>> - con_driver->dev = NULL; > >>> - con_driver->node = 0; > >>> - con_driver->flag = 0; > >>> - con_driver->first = 0; > >>> - con_driver->last = 0; > >>> + con_driver->flag = CON_DRIVER_FLAG_ZOMBIE; > >>> + schedule_work(&con_driver_unregister_work); > >>> + > >>> return 0; > >>> } > >>> } > >>> @@ -3678,6 +3676,39 @@ int do_unregister_con_driver(const struct consw *csw) > >>> } > >>> EXPORT_SYMBOL_GPL(do_unregister_con_driver); > >>> > >>> +static void con_driver_unregister_callback(struct work_struct *ignored) > >>> +{ > >>> + int i; > >>> + > >>> + console_lock(); > >>> + > >>> + for (i = 0; i < MAX_NR_CON_DRIVER; i++) { > >>> + struct con_driver *con_driver = ®istered_con_driver[i]; > >>> + > >>> + if (!(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) > >>> + continue; > >>> + > >>> + console_unlock(); > >>> + > >>> + vtconsole_deinit_device(con_driver); > >>> + device_destroy(vtconsole_class, MKDEV(0, con_driver->node)); > >>> + > >>> + if (WARN_ON_ONCE(con_driver->con)) > >>> + con_driver->con = NULL; > >>> + con_driver->desc = NULL; > >>> + con_driver->dev = NULL; > >>> + con_driver->node = 0; > >>> + WARN_ON_ONCE(con_driver->flag != CON_DRIVER_FLAG_ZOMBIE); > >>> + con_driver->flag = 0; > >>> + con_driver->first = 0; > >>> + con_driver->last = 0; > >>> + > >>> + console_lock(); > >>> + } > >>> + > >>> + console_unlock(); > >>> +} > >>> + > >>> /* > >>> * If we support more console drivers, this function is used > >>> * when a driver wants to take over some existing consoles > >>> > >> > > > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/3] vt: fix console lock vs. kernfs s_active lock order 2014-12-12 22:03 ` Imre Deak @ 2014-12-12 22:45 ` Peter Hurley 2014-12-12 22:58 ` Imre Deak 0 siblings, 1 reply; 17+ messages in thread From: Peter Hurley @ 2014-12-12 22:45 UTC (permalink / raw) To: imre.deak; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-kernel, Daniel Vetter [ +cc Daniel because of the i915 lockdep report ] On 12/12/2014 05:03 PM, Imre Deak wrote: > On Fri, 2014-12-12 at 16:03 -0500, Peter Hurley wrote: >> On 12/12/2014 03:29 PM, Imre Deak wrote: >>> Hi Peter, >>> >>> thanks for your review. >>> >>> On Fri, 2014-12-12 at 13:32 -0500, Peter Hurley wrote: >>>> Hi Imre, >>>> >>>> On 12/12/2014 11:38 AM, Imre Deak wrote: >>>>> Currently there is a lock order problem between the console lock and the >>>>> kernfs s_active lock of the console driver's bind sysfs entry. When >>>>> writing to the sysfs entry the lock order is first s_active then console >>>>> lock, when unregistering the console driver via >>>>> do_unregister_con_driver() the order is the opposite. See the below >>>>> bugzilla reference for one instance of a lockdep backtrace. >>>> >>>> This description didn't make sense to me because the driver core doesn't >>>> try to take the console_lock. So I had to go pull the lockdep report >>>> and I'm not sure I agree with your analysis. >>>> >>>> I see a three-way dependency which includes the fb atomic notifier call >>>> chain? >>> >>> From the lockdep report in the bugzilla ticket I referenced, you can see >>> the following two paths: >>> >>> i915_driver_load() >>> console_lock() -> takes console_sem >>> do_unregister_con_driver() >>> vtconsole_deinit_device() >>> device_remove_file() >>> ... >>> __kernfs_remove() >>> kernfs_drain() -> >>> takes s_active rwsem for the console's bind sysfs entry >>> (tracked via kn->dep_map) >> >> I don't see this call chain. >> >> I see: (some obvious intermediate calls redacted) >> >> i915_driver_unload >> do_unregister_framebuffer >> fb_notifier_call_chain >> fbcon_event_notify >> do_unregister_con_driver >> device_remove_file >> sysfs_addrm_finish >> >> which has console_lock => fb notifier lock => kernfs lock dependency. > > Ah, right, there are two dmesg logs in the bug report, your sequence is > the first one [1] and I talked about the second [2]. > > [1] https://bugs.freedesktop.org/attachment.cgi?id=87716 > [2] https://bugs.freedesktop.org/attachment.cgi?id=107602 Which is why your evidence/justification should be in the commit message rather than presuming that it's self-evident from a reference to a bug report. However, it doesn't seem to me that reference [2] provides more evidence for changing VT; why is i915 the only drm driver trying to unload the vga console from it's .load method? Without that, the problem is back to fbcon. [Aside: istm, the design error here was to expose bind/unbind as sysfs hooks from the VT console code, but that doesn't look remediable. ] Regards, Peter Hurley >> My point being that this occurs only because the fb notifier call chain >> requires console_lock => fb notifier lock dependency ordering; that and >> fbcon assumes that it can do whatever in the notifier. > > Yes this is correct and I don't see any place where we would have the > opposite order. > > The lockdep report in [1] is really just the same problem I described > which is the s_active->console_lock dependency: > > CPU0 CPU1 > ---- ---- > lock((fb_notifier_list).rwsem); > lock(console_lock); > lock((fb_notifier_list).rwsem); > lock(s_active#114); > *** DEADLOCK *** > > This can lead to a deadlock only because s_active already depends on > console_lock. After removing this dependency I can't see any other way > these locks could deadlock. > >> I would like to see more thorough justification for why this change >> belongs in vt, and not in fbcon. That was my point about the 3-way >> dependency. > > Since it's really about the dependency between console_lock and s_active > and fbcon is not involved in all code paths where we take these locks > (like in the [2] case) but vt is. > > --Imre > >> >> Regards, >> Peter Hurley >> >> >>> vfs_write() for the above console bind sysfs entry >>> kernfs_fop_write() >>> kernfs_get_active() -> >>> takes s_active rwsem for the above sysfs entry >>> ... >>> store_bind() -> takes console_sem >>> >>> So you have console_sem->s_active ordering on one path and >>> s_active->console_sem ordering on the other. >>> >>> This patch gets rid of the ordering problem and the related lockdep >>> warning. >>> >>> --Imre >>> >>>> >>>> Regards, >>>> Peter Hurley >>>> >>>>> Fix this by unregistering the console driver from a deferred work, where >>>>> we can safely drop the console lock while unregistering the device and >>>>> corresponding sysfs entries (which in turn acquire s_active). Note that >>>>> we have to keep the console driver slot in the registered_con_driver >>>>> array reserved for the driver that's being unregistered until it's fully >>>>> removed. Otherwise a concurrent call to do_register_con_driver could >>>>> try to reuse the same slot and fail when registering the corresponding >>>>> device with a minor index that's still in use. >>>>> >>>>> Reference: https://bugs.freedesktop.org/show_bug.cgi?id=70523 >>>>> Signed-off-by: Imre Deak <imre.deak@intel.com> >>>>> --- >>>>> drivers/tty/vt/vt.c | 51 +++++++++++++++++++++++++++++++++++++++++---------- >>>>> 1 file changed, 41 insertions(+), 10 deletions(-) >>>>> >>>>> diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c >>>>> index 5dd1880..b9edc77 100644 >>>>> --- a/drivers/tty/vt/vt.c >>>>> +++ b/drivers/tty/vt/vt.c >>>>> @@ -108,6 +108,7 @@ >>>>> #define CON_DRIVER_FLAG_MODULE 1 >>>>> #define CON_DRIVER_FLAG_INIT 2 >>>>> #define CON_DRIVER_FLAG_ATTR 4 >>>>> +#define CON_DRIVER_FLAG_ZOMBIE 8 >>>>> >>>>> struct con_driver { >>>>> const struct consw *con; >>>>> @@ -153,6 +154,7 @@ static int set_vesa_blanking(char __user *p); >>>>> static void set_cursor(struct vc_data *vc); >>>>> static void hide_cursor(struct vc_data *vc); >>>>> static void console_callback(struct work_struct *ignored); >>>>> +static void con_driver_unregister_callback(struct work_struct *ignored); >>>>> static void blank_screen_t(unsigned long dummy); >>>>> static void set_palette(struct vc_data *vc); >>>>> >>>>> @@ -180,6 +182,7 @@ static int blankinterval = 10*60; >>>>> core_param(consoleblank, blankinterval, int, 0444); >>>>> >>>>> static DECLARE_WORK(console_work, console_callback); >>>>> +static DECLARE_WORK(con_driver_unregister_work, con_driver_unregister_callback); >>>>> >>>>> /* >>>>> * fg_console is the current virtual console, >>>>> @@ -3597,7 +3600,8 @@ static int do_register_con_driver(const struct consw *csw, int first, int last) >>>>> for (i = 0; i < MAX_NR_CON_DRIVER; i++) { >>>>> con_driver = ®istered_con_driver[i]; >>>>> >>>>> - if (con_driver->con == NULL) { >>>>> + if (con_driver->con == NULL && >>>>> + !(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) { >>>>> con_driver->con = csw; >>>>> con_driver->desc = desc; >>>>> con_driver->node = i; >>>>> @@ -3660,16 +3664,10 @@ int do_unregister_con_driver(const struct consw *csw) >>>>> >>>>> if (con_driver->con == csw && >>>>> con_driver->flag & CON_DRIVER_FLAG_MODULE) { >>>>> - vtconsole_deinit_device(con_driver); >>>>> - device_destroy(vtconsole_class, >>>>> - MKDEV(0, con_driver->node)); >>>>> con_driver->con = NULL; >>>>> - con_driver->desc = NULL; >>>>> - con_driver->dev = NULL; >>>>> - con_driver->node = 0; >>>>> - con_driver->flag = 0; >>>>> - con_driver->first = 0; >>>>> - con_driver->last = 0; >>>>> + con_driver->flag = CON_DRIVER_FLAG_ZOMBIE; >>>>> + schedule_work(&con_driver_unregister_work); >>>>> + >>>>> return 0; >>>>> } >>>>> } >>>>> @@ -3678,6 +3676,39 @@ int do_unregister_con_driver(const struct consw *csw) >>>>> } >>>>> EXPORT_SYMBOL_GPL(do_unregister_con_driver); >>>>> >>>>> +static void con_driver_unregister_callback(struct work_struct *ignored) >>>>> +{ >>>>> + int i; >>>>> + >>>>> + console_lock(); >>>>> + >>>>> + for (i = 0; i < MAX_NR_CON_DRIVER; i++) { >>>>> + struct con_driver *con_driver = ®istered_con_driver[i]; >>>>> + >>>>> + if (!(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) >>>>> + continue; >>>>> + >>>>> + console_unlock(); >>>>> + >>>>> + vtconsole_deinit_device(con_driver); >>>>> + device_destroy(vtconsole_class, MKDEV(0, con_driver->node)); >>>>> + >>>>> + if (WARN_ON_ONCE(con_driver->con)) >>>>> + con_driver->con = NULL; >>>>> + con_driver->desc = NULL; >>>>> + con_driver->dev = NULL; >>>>> + con_driver->node = 0; >>>>> + WARN_ON_ONCE(con_driver->flag != CON_DRIVER_FLAG_ZOMBIE); >>>>> + con_driver->flag = 0; >>>>> + con_driver->first = 0; >>>>> + con_driver->last = 0; >>>>> + >>>>> + console_lock(); >>>>> + } >>>>> + >>>>> + console_unlock(); >>>>> +} >>>>> + >>>>> /* >>>>> * If we support more console drivers, this function is used >>>>> * when a driver wants to take over some existing consoles >>>>> >>>> >>> >>> >> > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3/3] vt: fix console lock vs. kernfs s_active lock order 2014-12-12 22:45 ` Peter Hurley @ 2014-12-12 22:58 ` Imre Deak 0 siblings, 0 replies; 17+ messages in thread From: Imre Deak @ 2014-12-12 22:58 UTC (permalink / raw) To: Peter Hurley; +Cc: Greg Kroah-Hartman, Jiri Slaby, linux-kernel, Daniel Vetter On Fri, 2014-12-12 at 17:45 -0500, Peter Hurley wrote: > [ +cc Daniel because of the i915 lockdep report ] > > On 12/12/2014 05:03 PM, Imre Deak wrote: > > On Fri, 2014-12-12 at 16:03 -0500, Peter Hurley wrote: > >> On 12/12/2014 03:29 PM, Imre Deak wrote: > >>> Hi Peter, > >>> > >>> thanks for your review. > >>> > >>> On Fri, 2014-12-12 at 13:32 -0500, Peter Hurley wrote: > >>>> Hi Imre, > >>>> > >>>> On 12/12/2014 11:38 AM, Imre Deak wrote: > >>>>> Currently there is a lock order problem between the console lock and the > >>>>> kernfs s_active lock of the console driver's bind sysfs entry. When > >>>>> writing to the sysfs entry the lock order is first s_active then console > >>>>> lock, when unregistering the console driver via > >>>>> do_unregister_con_driver() the order is the opposite. See the below > >>>>> bugzilla reference for one instance of a lockdep backtrace. > >>>> > >>>> This description didn't make sense to me because the driver core doesn't > >>>> try to take the console_lock. So I had to go pull the lockdep report > >>>> and I'm not sure I agree with your analysis. > >>>> > >>>> I see a three-way dependency which includes the fb atomic notifier call > >>>> chain? > >>> > >>> From the lockdep report in the bugzilla ticket I referenced, you can see > >>> the following two paths: > >>> > >>> i915_driver_load() > >>> console_lock() -> takes console_sem > >>> do_unregister_con_driver() > >>> vtconsole_deinit_device() > >>> device_remove_file() > >>> ... > >>> __kernfs_remove() > >>> kernfs_drain() -> > >>> takes s_active rwsem for the console's bind sysfs entry > >>> (tracked via kn->dep_map) > >> > >> I don't see this call chain. > >> > >> I see: (some obvious intermediate calls redacted) > >> > >> i915_driver_unload > >> do_unregister_framebuffer > >> fb_notifier_call_chain > >> fbcon_event_notify > >> do_unregister_con_driver > >> device_remove_file > >> sysfs_addrm_finish > >> > >> which has console_lock => fb notifier lock => kernfs lock dependency. > > > > Ah, right, there are two dmesg logs in the bug report, your sequence is > > the first one [1] and I talked about the second [2]. > > > > [1] https://bugs.freedesktop.org/attachment.cgi?id=87716 > > [2] https://bugs.freedesktop.org/attachment.cgi?id=107602 > > > Which is why your evidence/justification should be in the commit message > rather than presuming that it's self-evident from a reference to a bug report. > > However, it doesn't seem to me that reference [2] provides more evidence for > changing VT; why is i915 the only drm driver trying to unload the vga console > from it's .load method? Without that, the problem is back to fbcon. The same problem is present whenever you call do_unregister_con_driver(). This is the case in give_up_console(), which isn't fbcon specific. > [Aside: istm, the design error here was to expose bind/unbind as sysfs hooks > from the VT console code, but that doesn't look remediable. ] > > Regards, > Peter Hurley > > >> My point being that this occurs only because the fb notifier call chain > >> requires console_lock => fb notifier lock dependency ordering; that and > >> fbcon assumes that it can do whatever in the notifier. > > > > Yes this is correct and I don't see any place where we would have the > > opposite order. > > > > The lockdep report in [1] is really just the same problem I described > > which is the s_active->console_lock dependency: > > > > CPU0 CPU1 > > ---- ---- > > lock((fb_notifier_list).rwsem); > > lock(console_lock); > > lock((fb_notifier_list).rwsem); > > lock(s_active#114); > > *** DEADLOCK *** > > > > This can lead to a deadlock only because s_active already depends on > > console_lock. After removing this dependency I can't see any other way > > these locks could deadlock. > > > >> I would like to see more thorough justification for why this change > >> belongs in vt, and not in fbcon. That was my point about the 3-way > >> dependency. > > > > Since it's really about the dependency between console_lock and s_active > > and fbcon is not involved in all code paths where we take these locks > > (like in the [2] case) but vt is. > > > > --Imre > > > >> > >> Regards, > >> Peter Hurley > >> > >> > >>> vfs_write() for the above console bind sysfs entry > >>> kernfs_fop_write() > >>> kernfs_get_active() -> > >>> takes s_active rwsem for the above sysfs entry > >>> ... > >>> store_bind() -> takes console_sem > >>> > >>> So you have console_sem->s_active ordering on one path and > >>> s_active->console_sem ordering on the other. > >>> > >>> This patch gets rid of the ordering problem and the related lockdep > >>> warning. > >>> > >>> --Imre > >>> > >>>> > >>>> Regards, > >>>> Peter Hurley > >>>> > >>>>> Fix this by unregistering the console driver from a deferred work, where > >>>>> we can safely drop the console lock while unregistering the device and > >>>>> corresponding sysfs entries (which in turn acquire s_active). Note that > >>>>> we have to keep the console driver slot in the registered_con_driver > >>>>> array reserved for the driver that's being unregistered until it's fully > >>>>> removed. Otherwise a concurrent call to do_register_con_driver could > >>>>> try to reuse the same slot and fail when registering the corresponding > >>>>> device with a minor index that's still in use. > >>>>> > >>>>> Reference: https://bugs.freedesktop.org/show_bug.cgi?id=70523 > >>>>> Signed-off-by: Imre Deak <imre.deak@intel.com> > >>>>> --- > >>>>> drivers/tty/vt/vt.c | 51 +++++++++++++++++++++++++++++++++++++++++---------- > >>>>> 1 file changed, 41 insertions(+), 10 deletions(-) > >>>>> > >>>>> diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c > >>>>> index 5dd1880..b9edc77 100644 > >>>>> --- a/drivers/tty/vt/vt.c > >>>>> +++ b/drivers/tty/vt/vt.c > >>>>> @@ -108,6 +108,7 @@ > >>>>> #define CON_DRIVER_FLAG_MODULE 1 > >>>>> #define CON_DRIVER_FLAG_INIT 2 > >>>>> #define CON_DRIVER_FLAG_ATTR 4 > >>>>> +#define CON_DRIVER_FLAG_ZOMBIE 8 > >>>>> > >>>>> struct con_driver { > >>>>> const struct consw *con; > >>>>> @@ -153,6 +154,7 @@ static int set_vesa_blanking(char __user *p); > >>>>> static void set_cursor(struct vc_data *vc); > >>>>> static void hide_cursor(struct vc_data *vc); > >>>>> static void console_callback(struct work_struct *ignored); > >>>>> +static void con_driver_unregister_callback(struct work_struct *ignored); > >>>>> static void blank_screen_t(unsigned long dummy); > >>>>> static void set_palette(struct vc_data *vc); > >>>>> > >>>>> @@ -180,6 +182,7 @@ static int blankinterval = 10*60; > >>>>> core_param(consoleblank, blankinterval, int, 0444); > >>>>> > >>>>> static DECLARE_WORK(console_work, console_callback); > >>>>> +static DECLARE_WORK(con_driver_unregister_work, con_driver_unregister_callback); > >>>>> > >>>>> /* > >>>>> * fg_console is the current virtual console, > >>>>> @@ -3597,7 +3600,8 @@ static int do_register_con_driver(const struct consw *csw, int first, int last) > >>>>> for (i = 0; i < MAX_NR_CON_DRIVER; i++) { > >>>>> con_driver = ®istered_con_driver[i]; > >>>>> > >>>>> - if (con_driver->con == NULL) { > >>>>> + if (con_driver->con == NULL && > >>>>> + !(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) { > >>>>> con_driver->con = csw; > >>>>> con_driver->desc = desc; > >>>>> con_driver->node = i; > >>>>> @@ -3660,16 +3664,10 @@ int do_unregister_con_driver(const struct consw *csw) > >>>>> > >>>>> if (con_driver->con == csw && > >>>>> con_driver->flag & CON_DRIVER_FLAG_MODULE) { > >>>>> - vtconsole_deinit_device(con_driver); > >>>>> - device_destroy(vtconsole_class, > >>>>> - MKDEV(0, con_driver->node)); > >>>>> con_driver->con = NULL; > >>>>> - con_driver->desc = NULL; > >>>>> - con_driver->dev = NULL; > >>>>> - con_driver->node = 0; > >>>>> - con_driver->flag = 0; > >>>>> - con_driver->first = 0; > >>>>> - con_driver->last = 0; > >>>>> + con_driver->flag = CON_DRIVER_FLAG_ZOMBIE; > >>>>> + schedule_work(&con_driver_unregister_work); > >>>>> + > >>>>> return 0; > >>>>> } > >>>>> } > >>>>> @@ -3678,6 +3676,39 @@ int do_unregister_con_driver(const struct consw *csw) > >>>>> } > >>>>> EXPORT_SYMBOL_GPL(do_unregister_con_driver); > >>>>> > >>>>> +static void con_driver_unregister_callback(struct work_struct *ignored) > >>>>> +{ > >>>>> + int i; > >>>>> + > >>>>> + console_lock(); > >>>>> + > >>>>> + for (i = 0; i < MAX_NR_CON_DRIVER; i++) { > >>>>> + struct con_driver *con_driver = ®istered_con_driver[i]; > >>>>> + > >>>>> + if (!(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) > >>>>> + continue; > >>>>> + > >>>>> + console_unlock(); > >>>>> + > >>>>> + vtconsole_deinit_device(con_driver); > >>>>> + device_destroy(vtconsole_class, MKDEV(0, con_driver->node)); > >>>>> + > >>>>> + if (WARN_ON_ONCE(con_driver->con)) > >>>>> + con_driver->con = NULL; > >>>>> + con_driver->desc = NULL; > >>>>> + con_driver->dev = NULL; > >>>>> + con_driver->node = 0; > >>>>> + WARN_ON_ONCE(con_driver->flag != CON_DRIVER_FLAG_ZOMBIE); > >>>>> + con_driver->flag = 0; > >>>>> + con_driver->first = 0; > >>>>> + con_driver->last = 0; > >>>>> + > >>>>> + console_lock(); > >>>>> + } > >>>>> + > >>>>> + console_unlock(); > >>>>> +} > >>>>> + > >>>>> /* > >>>>> * If we support more console drivers, this function is used > >>>>> * when a driver wants to take over some existing consoles > >>>>> > >>>> > >>> > >>> > >> > > > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 3/3] vt: fix console lock vs. kernfs s_active lock order 2014-12-12 16:38 ` [PATCH 3/3] vt: fix console lock vs. kernfs s_active lock order Imre Deak 2014-12-12 18:32 ` Peter Hurley @ 2014-12-13 21:14 ` Imre Deak 2014-12-13 23:07 ` [PATCH v3 " Imre Deak 1 sibling, 1 reply; 17+ messages in thread From: Imre Deak @ 2014-12-13 21:14 UTC (permalink / raw) To: Greg Kroah-Hartman, Jiri Slaby; +Cc: Peter Hurley, Daniel Vetter, linux-kernel Currently there is a lock order problem between the console lock and the kernfs s_active lock of the console driver's bind sysfs entry. When writing to the sysfs entry the lock order is first s_active then console lock, when unregistering the console driver via do_unregister_con_driver() the order is the opposite. See the below bugzilla reference for one instance of a lockdep backtrace. Fix this by unregistering the console driver from a deferred work, where we can safely drop the console lock while unregistering the device and corresponding sysfs entries (which in turn acquire s_active). Note that we have to keep the console driver slot in the registered_con_driver array reserved for the driver that's being unregistered until it's fully removed. Otherwise a concurrent call to do_register_con_driver could try to reuse the same slot and fail when registering the corresponding device with a minor index that's still in use. Note that the referenced bug report contains two dmesg logs with two distinct lockdep reports: the first one [1] is about a locking scenario involving s_active, console_lock and the fb_notifier list lock, while the second [2] is about a locking scenario involving only s_active and console_lock. In [1] locking fb_notifier triggers the lockdep warning only because of its dependence on console_lock, otherwise case [1] is the same s_active<->console_lock dependency problem fixed by this patch. In other words the fb_notifier lock doesn't contribute to this locking problem, we always acquire it before both s_active and console_lock. Thus this locking problem is independent of the fb or fbcon code (owning the fb_notifier lock) and is instead inherent in the vt code creating the console bind sysfs entry (and so owning the s_active lock). [1] https://bugs.freedesktop.org/attachment.cgi?id=87716 [2] https://bugs.freedesktop.org/attachment.cgi?id=107602 v2: - get console_lock earlier in con_driver_unregister_callback(), so we protect the following console driver field assignments there - add code coment explaining the reason for deferring the sysfs entry removal - add a third paragraph to the commit message explaining why there are two distinct lockdep reports and that this issue is independent of fb/fbcon. (Peter Hurley) Reference: https://bugs.freedesktop.org/show_bug.cgi?id=70523 Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/tty/vt/vt.c | 61 ++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 51 insertions(+), 10 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 5dd1880..c19333f 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -108,6 +108,7 @@ #define CON_DRIVER_FLAG_MODULE 1 #define CON_DRIVER_FLAG_INIT 2 #define CON_DRIVER_FLAG_ATTR 4 +#define CON_DRIVER_FLAG_ZOMBIE 8 struct con_driver { const struct consw *con; @@ -153,6 +154,7 @@ static int set_vesa_blanking(char __user *p); static void set_cursor(struct vc_data *vc); static void hide_cursor(struct vc_data *vc); static void console_callback(struct work_struct *ignored); +static void con_driver_unregister_callback(struct work_struct *ignored); static void blank_screen_t(unsigned long dummy); static void set_palette(struct vc_data *vc); @@ -180,6 +182,7 @@ static int blankinterval = 10*60; core_param(consoleblank, blankinterval, int, 0444); static DECLARE_WORK(console_work, console_callback); +static DECLARE_WORK(con_driver_unregister_work, con_driver_unregister_callback); /* * fg_console is the current virtual console, @@ -3597,7 +3600,8 @@ static int do_register_con_driver(const struct consw *csw, int first, int last) for (i = 0; i < MAX_NR_CON_DRIVER; i++) { con_driver = ®istered_con_driver[i]; - if (con_driver->con == NULL) { + if (con_driver->con == NULL && + !(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) { con_driver->con = csw; con_driver->desc = desc; con_driver->node = i; @@ -3660,16 +3664,20 @@ int do_unregister_con_driver(const struct consw *csw) if (con_driver->con == csw && con_driver->flag & CON_DRIVER_FLAG_MODULE) { - vtconsole_deinit_device(con_driver); - device_destroy(vtconsole_class, - MKDEV(0, con_driver->node)); + /* + * Defer the removal of the sysfs entries since that + * will acquire the kernfs s_active lock and we can't + * acquire this lock while holding the console lock: + * the unbind sysfs entry imposes already the opposite + * order. Reset con already here to prevent any later + * lookup to succeed and mark this slot as zombie, so + * it won't get reused until we complete the removal + * in the deferred work. + */ con_driver->con = NULL; - con_driver->desc = NULL; - con_driver->dev = NULL; - con_driver->node = 0; - con_driver->flag = 0; - con_driver->first = 0; - con_driver->last = 0; + con_driver->flag = CON_DRIVER_FLAG_ZOMBIE; + schedule_work(&con_driver_unregister_work); + return 0; } } @@ -3678,6 +3686,39 @@ int do_unregister_con_driver(const struct consw *csw) } EXPORT_SYMBOL_GPL(do_unregister_con_driver); +static void con_driver_unregister_callback(struct work_struct *ignored) +{ + int i; + + console_lock(); + + for (i = 0; i < MAX_NR_CON_DRIVER; i++) { + struct con_driver *con_driver = ®istered_con_driver[i]; + + if (!(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) + continue; + + console_unlock(); + + vtconsole_deinit_device(con_driver); + device_destroy(vtconsole_class, MKDEV(0, con_driver->node)); + + console_lock(); + + if (WARN_ON_ONCE(con_driver->con)) + con_driver->con = NULL; + con_driver->desc = NULL; + con_driver->dev = NULL; + con_driver->node = 0; + WARN_ON_ONCE(con_driver->flag != CON_DRIVER_FLAG_ZOMBIE); + con_driver->flag = 0; + con_driver->first = 0; + con_driver->last = 0; + } + + console_unlock(); +} + /* * If we support more console drivers, this function is used * when a driver wants to take over some existing consoles -- 1.8.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v3 3/3] vt: fix console lock vs. kernfs s_active lock order 2014-12-13 21:14 ` [PATCH v2 " Imre Deak @ 2014-12-13 23:07 ` Imre Deak 0 siblings, 0 replies; 17+ messages in thread From: Imre Deak @ 2014-12-13 23:07 UTC (permalink / raw) To: Greg Kroah-Hartman, Jiri Slaby; +Cc: Peter Hurley, Daniel Vetter, linux-kernel Currently there is a lock order problem between the console lock and the kernfs s_active lock of the console driver's bind sysfs entry. When writing to the sysfs entry the lock order is first s_active then console lock, when unregistering the console driver via do_unregister_con_driver() the order is the opposite. See the below bugzilla reference for one instance of a lockdep backtrace. Fix this by unregistering the console driver from a deferred work, where we can safely drop the console lock while unregistering the device and corresponding sysfs entries (which in turn acquire s_active). Note that we have to keep the console driver slot in the registered_con_driver array reserved for the driver that's being unregistered until it's fully removed. Otherwise a concurrent call to do_register_con_driver could try to reuse the same slot and fail when registering the corresponding device with a minor index that's still in use. Note that the referenced bug report contains two dmesg logs with two distinct lockdep reports: [1] is about a locking scenario involving s_active, console_lock and the fb_notifier list lock, while [2] is about a locking scenario involving only s_active and console_lock. In [1] locking fb_notifier triggers the lockdep warning only because of its dependence on console_lock, otherwise case [1] is the same s_active<->console_lock dependency problem fixed by this patch. Before this change we have the following locking scenarios involving the 3 locks: a) via do_unregister_framebuffer()->...->do_unregister_con_driver(): 1. console lock 2. fb_notifier lock 3. s_active lock b) for example via give_up_console()->do_unregister_con_driver(): 1. console lock 2. s_active lock c) via vt_bind()/vt_unbind(): 1. s_active lock 2. console lock Since c) is the console bind sysfs entry's write code path we can't change the locking order there. We can only fix this issue by removing s_active's dependence on the other two locks in a) and b). We can do this only in the vt code which owns the corresponding sysfs entry, so that after the change we have: a) 1. console lock 2. fb_notifier lock b) 1. console lock c) 1. s_active lock 2. console lock d) in the new con_driver_unregister_callback(): 1. s_active lock [1] https://bugs.freedesktop.org/attachment.cgi?id=87716 [2] https://bugs.freedesktop.org/attachment.cgi?id=107602 v2: - get console_lock earlier in con_driver_unregister_callback(), so we protect the following console driver field assignments there - add code coment explaining the reason for deferring the sysfs entry removal - add a third paragraph to the commit message explaining why there are two distinct lockdep reports and that this issue is independent of fb/fbcon. (Peter Hurley) v3: - clarify further the third paragraph Reference: https://bugs.freedesktop.org/show_bug.cgi?id=70523 Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/tty/vt/vt.c | 61 ++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 51 insertions(+), 10 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 5dd1880..c19333f 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -108,6 +108,7 @@ #define CON_DRIVER_FLAG_MODULE 1 #define CON_DRIVER_FLAG_INIT 2 #define CON_DRIVER_FLAG_ATTR 4 +#define CON_DRIVER_FLAG_ZOMBIE 8 struct con_driver { const struct consw *con; @@ -153,6 +154,7 @@ static int set_vesa_blanking(char __user *p); static void set_cursor(struct vc_data *vc); static void hide_cursor(struct vc_data *vc); static void console_callback(struct work_struct *ignored); +static void con_driver_unregister_callback(struct work_struct *ignored); static void blank_screen_t(unsigned long dummy); static void set_palette(struct vc_data *vc); @@ -180,6 +182,7 @@ static int blankinterval = 10*60; core_param(consoleblank, blankinterval, int, 0444); static DECLARE_WORK(console_work, console_callback); +static DECLARE_WORK(con_driver_unregister_work, con_driver_unregister_callback); /* * fg_console is the current virtual console, @@ -3597,7 +3600,8 @@ static int do_register_con_driver(const struct consw *csw, int first, int last) for (i = 0; i < MAX_NR_CON_DRIVER; i++) { con_driver = ®istered_con_driver[i]; - if (con_driver->con == NULL) { + if (con_driver->con == NULL && + !(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) { con_driver->con = csw; con_driver->desc = desc; con_driver->node = i; @@ -3660,16 +3664,20 @@ int do_unregister_con_driver(const struct consw *csw) if (con_driver->con == csw && con_driver->flag & CON_DRIVER_FLAG_MODULE) { - vtconsole_deinit_device(con_driver); - device_destroy(vtconsole_class, - MKDEV(0, con_driver->node)); + /* + * Defer the removal of the sysfs entries since that + * will acquire the kernfs s_active lock and we can't + * acquire this lock while holding the console lock: + * the unbind sysfs entry imposes already the opposite + * order. Reset con already here to prevent any later + * lookup to succeed and mark this slot as zombie, so + * it won't get reused until we complete the removal + * in the deferred work. + */ con_driver->con = NULL; - con_driver->desc = NULL; - con_driver->dev = NULL; - con_driver->node = 0; - con_driver->flag = 0; - con_driver->first = 0; - con_driver->last = 0; + con_driver->flag = CON_DRIVER_FLAG_ZOMBIE; + schedule_work(&con_driver_unregister_work); + return 0; } } @@ -3678,6 +3686,39 @@ int do_unregister_con_driver(const struct consw *csw) } EXPORT_SYMBOL_GPL(do_unregister_con_driver); +static void con_driver_unregister_callback(struct work_struct *ignored) +{ + int i; + + console_lock(); + + for (i = 0; i < MAX_NR_CON_DRIVER; i++) { + struct con_driver *con_driver = ®istered_con_driver[i]; + + if (!(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) + continue; + + console_unlock(); + + vtconsole_deinit_device(con_driver); + device_destroy(vtconsole_class, MKDEV(0, con_driver->node)); + + console_lock(); + + if (WARN_ON_ONCE(con_driver->con)) + con_driver->con = NULL; + con_driver->desc = NULL; + con_driver->dev = NULL; + con_driver->node = 0; + WARN_ON_ONCE(con_driver->flag != CON_DRIVER_FLAG_ZOMBIE); + con_driver->flag = 0; + con_driver->first = 0; + con_driver->last = 0; + } + + console_unlock(); +} + /* * If we support more console drivers, this function is used * when a driver wants to take over some existing consoles -- 1.8.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/3] vt: fix check for system/busy console drivers when unregistering them 2014-12-12 16:38 [PATCH 1/3] vt: fix check for system/busy console drivers when unregistering them Imre Deak 2014-12-12 16:38 ` [PATCH 2/3] vt: fix locking around vt_bind/vt_unbind Imre Deak 2014-12-12 16:38 ` [PATCH 3/3] vt: fix console lock vs. kernfs s_active lock order Imre Deak @ 2014-12-12 17:30 ` Peter Hurley 2014-12-13 21:14 ` [PATCH v2 " Imre Deak 3 siblings, 0 replies; 17+ messages in thread From: Peter Hurley @ 2014-12-12 17:30 UTC (permalink / raw) To: Imre Deak, Greg Kroah-Hartman, Jiri Slaby; +Cc: linux-kernel Hi Imre, On 12/12/2014 11:38 AM, Imre Deak wrote: > System console drivers (without the CON_DRIVER_FLAG_MODULE flag) and > busy drivers bound to a console (as reported by con_is_bound()) > shouldn't be unregistered. The current code checks for the > CON_DRIVER_FLAG_INIT flag but this doesn't really correspond to either > of the above two conditions. CON_DRIVER_FLAG_INIT is set whenever its > associated console's con_startup() function is called, which first > happens when the console driver is registered (so before the console > gets bound) and gets cleared when the console gets unbound. The > purpose of this flag is to show if we need to call con_startup() on a > console before we use it. > > Based on the above, do_unregister_con_driver() in its current form will > incorrectly allow unregistering a console driver only if it was never > bound, but will refuse to unregister one that was bound and later > unbound. It will also allow unregistering a system console driver. > > Fix this by checking for CON_DRIVER_FLAG_MODULE to refuse unregistering > a system console driver and relying on the existing con_is_bound() check > earlier in the function to refuse unregistering a busy console driver. Maybe reword these two paragraphs? "Allow non-system console drivers to unregister, and prevent system console drivers from unregistering." Regards, Peter Hurley > Signed-off-by: Imre Deak <imre.deak@intel.com> > --- > drivers/tty/vt/vt.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c > index b33b00b..1862e89 100644 > --- a/drivers/tty/vt/vt.c > +++ b/drivers/tty/vt/vt.c > @@ -3660,7 +3660,7 @@ int do_unregister_con_driver(const struct consw *csw) > struct con_driver *con_driver = ®istered_con_driver[i]; > > if (con_driver->con == csw && > - con_driver->flag & CON_DRIVER_FLAG_INIT) { > + con_driver->flag & CON_DRIVER_FLAG_MODULE) { > vtconsole_deinit_device(con_driver); > device_destroy(vtconsole_class, > MKDEV(0, con_driver->node)); > ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 1/3] vt: fix check for system/busy console drivers when unregistering them 2014-12-12 16:38 [PATCH 1/3] vt: fix check for system/busy console drivers when unregistering them Imre Deak ` (2 preceding siblings ...) 2014-12-12 17:30 ` [PATCH 1/3] vt: fix check for system/busy console drivers when unregistering them Peter Hurley @ 2014-12-13 21:14 ` Imre Deak 2014-12-15 15:05 ` Daniel Vetter 3 siblings, 1 reply; 17+ messages in thread From: Imre Deak @ 2014-12-13 21:14 UTC (permalink / raw) To: Greg Kroah-Hartman, Jiri Slaby; +Cc: Peter Hurley, Daniel Vetter, linux-kernel System console drivers (without the CON_DRIVER_FLAG_MODULE flag) and busy drivers bound to a console (as reported by con_is_bound()) shouldn't be unregistered. The current code checks for the CON_DRIVER_FLAG_INIT flag but this doesn't really correspond to either of the above two conditions. CON_DRIVER_FLAG_INIT is set whenever its associated console's con_startup() function is called, which first happens when the console driver is registered (so before the console gets bound) and gets cleared when the console gets unbound. The purpose of this flag is to show if we need to call con_startup() on a console before we use it. Based on the above, do_unregister_con_driver() in its current form will incorrectly allow unregistering a console driver only if it was never bound, but will refuse to unregister one that was bound and later unbound. It will also allow unregistering a system console driver. Fix this by checking for CON_DRIVER_FLAG_MODULE to allow non-system console drivers to unregister and prevent system console drivers from unregistering. Rely on the existing con_is_bound() check earlier in the function to refuse unregistering a busy console driver. v2: - reword the third paragraph to clarify how the fix works (Peter Hurley) Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/tty/vt/vt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index b33b00b..1862e89 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -3660,7 +3660,7 @@ int do_unregister_con_driver(const struct consw *csw) struct con_driver *con_driver = ®istered_con_driver[i]; if (con_driver->con == csw && - con_driver->flag & CON_DRIVER_FLAG_INIT) { + con_driver->flag & CON_DRIVER_FLAG_MODULE) { vtconsole_deinit_device(con_driver); device_destroy(vtconsole_class, MKDEV(0, con_driver->node)); -- 1.8.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/3] vt: fix check for system/busy console drivers when unregistering them 2014-12-13 21:14 ` [PATCH v2 " Imre Deak @ 2014-12-15 15:05 ` Daniel Vetter 2014-12-15 16:08 ` Imre Deak 0 siblings, 1 reply; 17+ messages in thread From: Daniel Vetter @ 2014-12-15 15:05 UTC (permalink / raw) To: Imre Deak Cc: Greg Kroah-Hartman, Jiri Slaby, Peter Hurley, Linux Kernel Mailing List This seems to partially revert commit d9c660e750fdf982e1e2bdd0e76c1e6c4db4217b Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Jun 5 16:29:56 2014 +0200 vt: Fix up unregistration of vt drivers A bunch of issues: - We should not kick out the default console (which is tracked in conswitchp), so check for that. - Add better error codes so callers can differentiate between "something went wrong" and "your driver isn't registered already". i915 needs that so it doesn't fall over when reloading the driver and hence vga_con is already unregistered. - There's a mess with the driver flags: What we need to check for is that the driver isn't used any more, i.e. unbound completely (FLAG_INIT). And not whether it's the boot console or not (which is the only one which doesn't have FLAG_MODULE). Otherwise there's no way to kick out the boot console, which i915 wants to do to prevent havoc with vga_con interferring (which tends to hang machines). Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jslaby@suse.cz> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> We really need to unregister vgacon when i915 loads since we completely kill vga support - just unbinding is not enough since then vgacon will be resurrect on module unload, killing the machine. And module unload is really important to stay sane as kernel driver hacker. If we need more precise checks for unregistering then I think we need to fix up that mess with the flags ... -Daniel On Sat, Dec 13, 2014 at 10:14 PM, Imre Deak <imre.deak@intel.com> wrote: > System console drivers (without the CON_DRIVER_FLAG_MODULE flag) and > busy drivers bound to a console (as reported by con_is_bound()) > shouldn't be unregistered. The current code checks for the > CON_DRIVER_FLAG_INIT flag but this doesn't really correspond to either > of the above two conditions. CON_DRIVER_FLAG_INIT is set whenever its > associated console's con_startup() function is called, which first > happens when the console driver is registered (so before the console > gets bound) and gets cleared when the console gets unbound. The > purpose of this flag is to show if we need to call con_startup() on a > console before we use it. > > Based on the above, do_unregister_con_driver() in its current form will > incorrectly allow unregistering a console driver only if it was never > bound, but will refuse to unregister one that was bound and later > unbound. It will also allow unregistering a system console driver. > > Fix this by checking for CON_DRIVER_FLAG_MODULE to allow non-system > console drivers to unregister and prevent system console drivers from > unregistering. Rely on the existing con_is_bound() check earlier in the > function to refuse unregistering a busy console driver. > > v2: > - reword the third paragraph to clarify how the fix works (Peter Hurley) > > Signed-off-by: Imre Deak <imre.deak@intel.com> > --- > drivers/tty/vt/vt.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c > index b33b00b..1862e89 100644 > --- a/drivers/tty/vt/vt.c > +++ b/drivers/tty/vt/vt.c > @@ -3660,7 +3660,7 @@ int do_unregister_con_driver(const struct consw *csw) > struct con_driver *con_driver = ®istered_con_driver[i]; > > if (con_driver->con == csw && > - con_driver->flag & CON_DRIVER_FLAG_INIT) { > + con_driver->flag & CON_DRIVER_FLAG_MODULE) { > vtconsole_deinit_device(con_driver); > device_destroy(vtconsole_class, > MKDEV(0, con_driver->node)); > -- > 1.8.4 > -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/3] vt: fix check for system/busy console drivers when unregistering them 2014-12-15 15:05 ` Daniel Vetter @ 2014-12-15 16:08 ` Imre Deak 0 siblings, 0 replies; 17+ messages in thread From: Imre Deak @ 2014-12-15 16:08 UTC (permalink / raw) To: Daniel Vetter Cc: Greg Kroah-Hartman, Jiri Slaby, Peter Hurley, Linux Kernel Mailing List On Mon, 2014-12-15 at 16:05 +0100, Daniel Vetter wrote: > This seems to partially revert > > commit d9c660e750fdf982e1e2bdd0e76c1e6c4db4217b > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > Date: Thu Jun 5 16:29:56 2014 +0200 > > vt: Fix up unregistration of vt drivers > > A bunch of issues: > - We should not kick out the default console (which is tracked in > conswitchp), so check for that. > - Add better error codes so callers can differentiate between "something > went wrong" and "your driver isn't registered already". i915 needs > that so it doesn't fall over when reloading the driver and hence > vga_con is already unregistered. > - There's a mess with the driver flags: What we need to check for is > that the driver isn't used any more, i.e. unbound completely (FLAG_INIT). > And not whether it's the boot console or not (which is the only one > which doesn't have FLAG_MODULE). Otherwise there's no way to kick > out the boot console, which i915 wants to do to prevent havoc with > vga_con interferring (which tends to hang machines). > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Jiri Slaby <jslaby@suse.cz> > Reviewed-by: David Herrmann <dh.herrmann@gmail.com> > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > We really need to unregister vgacon when i915 loads since we > completely kill vga support - just unbinding is not enough since then > vgacon will be resurrect on module unload, killing the machine. And > module unload is really important to stay sane as kernel driver > hacker. > > If we need more precise checks for unregistering then I think we need > to fix up that mess with the flags ... Ok, after discussing on IRC with Daniel, I agree this would break the module reload case for i915 and we should allow to unload system console drivers too. The only important thing seems to be that we have at least one console driver left, let it be system or non-system, but that's already guaranteed by the (csw == conswitchp) check. So I'll send a new version removing the con_driver->flag check. --Imre > -Daniel > > > On Sat, Dec 13, 2014 at 10:14 PM, Imre Deak <imre.deak@intel.com> wrote: > > System console drivers (without the CON_DRIVER_FLAG_MODULE flag) and > > busy drivers bound to a console (as reported by con_is_bound()) > > shouldn't be unregistered. The current code checks for the > > CON_DRIVER_FLAG_INIT flag but this doesn't really correspond to either > > of the above two conditions. CON_DRIVER_FLAG_INIT is set whenever its > > associated console's con_startup() function is called, which first > > happens when the console driver is registered (so before the console > > gets bound) and gets cleared when the console gets unbound. The > > purpose of this flag is to show if we need to call con_startup() on a > > console before we use it. > > > > Based on the above, do_unregister_con_driver() in its current form will > > incorrectly allow unregistering a console driver only if it was never > > bound, but will refuse to unregister one that was bound and later > > unbound. It will also allow unregistering a system console driver. > > > > Fix this by checking for CON_DRIVER_FLAG_MODULE to allow non-system > > console drivers to unregister and prevent system console drivers from > > unregistering. Rely on the existing con_is_bound() check earlier in the > > function to refuse unregistering a busy console driver. > > > > v2: > > - reword the third paragraph to clarify how the fix works (Peter Hurley) > > > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > --- > > drivers/tty/vt/vt.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c > > index b33b00b..1862e89 100644 > > --- a/drivers/tty/vt/vt.c > > +++ b/drivers/tty/vt/vt.c > > @@ -3660,7 +3660,7 @@ int do_unregister_con_driver(const struct consw *csw) > > struct con_driver *con_driver = ®istered_con_driver[i]; > > > > if (con_driver->con == csw && > > - con_driver->flag & CON_DRIVER_FLAG_INIT) { > > + con_driver->flag & CON_DRIVER_FLAG_MODULE) { > > vtconsole_deinit_device(con_driver); > > device_destroy(vtconsole_class, > > MKDEV(0, con_driver->node)); > > -- > > 1.8.4 > > > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2014-12-15 16:08 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-12-12 16:38 [PATCH 1/3] vt: fix check for system/busy console drivers when unregistering them Imre Deak 2014-12-12 16:38 ` [PATCH 2/3] vt: fix locking around vt_bind/vt_unbind Imre Deak 2014-12-12 17:53 ` Peter Hurley 2014-12-12 16:38 ` [PATCH 3/3] vt: fix console lock vs. kernfs s_active lock order Imre Deak 2014-12-12 18:32 ` Peter Hurley 2014-12-12 20:29 ` Imre Deak 2014-12-12 20:55 ` Imre Deak 2014-12-12 21:03 ` Peter Hurley 2014-12-12 22:03 ` Imre Deak 2014-12-12 22:45 ` Peter Hurley 2014-12-12 22:58 ` Imre Deak 2014-12-13 21:14 ` [PATCH v2 " Imre Deak 2014-12-13 23:07 ` [PATCH v3 " Imre Deak 2014-12-12 17:30 ` [PATCH 1/3] vt: fix check for system/busy console drivers when unregistering them Peter Hurley 2014-12-13 21:14 ` [PATCH v2 " Imre Deak 2014-12-15 15:05 ` Daniel Vetter 2014-12-15 16:08 ` Imre Deak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox