public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Hurley <peter@hurleysoftware.com>
To: imre.deak@intel.com
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jiri Slaby <jslaby@suse.cz>,
	linux-kernel@vger.kernel.org,
	Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: Re: [PATCH 3/3] vt: fix console lock vs. kernfs s_active lock order
Date: Fri, 12 Dec 2014 17:45:23 -0500	[thread overview]
Message-ID: <548B7003.2000502@hurleysoftware.com> (raw)
In-Reply-To: <1418421811.9524.60.camel@ideak-mobl>

[ +cc Daniel because of the i915 lockdep report ]

On 12/12/2014 05:03 PM, Imre Deak wrote:
> On Fri, 2014-12-12 at 16:03 -0500, Peter Hurley wrote:
>> On 12/12/2014 03:29 PM, Imre Deak wrote:
>>> Hi Peter,
>>>
>>> thanks for your review.
>>>
>>> On Fri, 2014-12-12 at 13:32 -0500, Peter Hurley wrote:
>>>> Hi Imre,
>>>>
>>>> On 12/12/2014 11:38 AM, Imre Deak wrote:
>>>>> Currently there is a lock order problem between the console lock and the
>>>>> kernfs s_active lock of the console driver's bind sysfs entry. When
>>>>> writing to the sysfs entry the lock order is first s_active then console
>>>>> lock, when unregistering the console driver via
>>>>> do_unregister_con_driver() the order is the opposite. See the below
>>>>> bugzilla reference for one instance of a lockdep backtrace.
>>>>
>>>> This description didn't make sense to me because the driver core doesn't
>>>> try to take the console_lock. So I had to go pull the lockdep report
>>>> and I'm not sure I agree with your analysis.
>>>>
>>>> I see a three-way dependency which includes the fb atomic notifier call
>>>> chain?
>>>
>>> From the lockdep report in the bugzilla ticket I referenced, you can see
>>> the following two paths:
>>>
>>> i915_driver_load()
>>>   console_lock()  -> takes console_sem
>>>   do_unregister_con_driver()
>>>     vtconsole_deinit_device()
>>>       device_remove_file()
>>>         ...
>>>         __kernfs_remove()
>>>           kernfs_drain() ->
>>>             takes s_active rwsem for the console's bind sysfs entry
>>>             (tracked via kn->dep_map)
>>
>> I don't see this call chain.
>>
>> I see: (some obvious intermediate calls redacted)
>>
>>   i915_driver_unload
>>     do_unregister_framebuffer
>>       fb_notifier_call_chain
>>         fbcon_event_notify
>>           do_unregister_con_driver
>>             device_remove_file
>>               sysfs_addrm_finish
>>
>> which has console_lock => fb notifier lock => kernfs lock dependency.
> 
> Ah, right, there are two dmesg logs in the bug report, your sequence is
> the first one [1] and I talked about the second [2].
> 
> [1] https://bugs.freedesktop.org/attachment.cgi?id=87716
> [2] https://bugs.freedesktop.org/attachment.cgi?id=107602


Which is why your evidence/justification should be in the commit message
rather than presuming that it's self-evident from a reference to a bug report.

However, it doesn't seem to me that reference [2] provides more evidence for
changing VT; why is i915 the only drm driver trying to unload the vga console
from it's .load method?  Without that, the problem is back to fbcon.

[Aside: istm, the design error here was to expose bind/unbind as sysfs hooks
from the VT console code, but that doesn't look remediable. ]

Regards,
Peter Hurley

>> My point being that this occurs only because the fb notifier call chain
>> requires console_lock => fb notifier lock dependency ordering; that and
>> fbcon assumes that it can do whatever in the notifier.
> 
> Yes this is correct and I don't see any place where we would have the
> opposite order.
> 
> The lockdep report in [1] is really just the same problem I described
> which is the s_active->console_lock dependency:
> 
> CPU0                                   CPU1
> ----                                   ----
> lock((fb_notifier_list).rwsem);
>                                        lock(console_lock);
>                                        lock((fb_notifier_list).rwsem);
> lock(s_active#114);
> *** DEADLOCK ***
> 
> This can lead to a deadlock only because s_active already depends on
> console_lock. After removing this dependency I can't see any other way
> these locks could deadlock.
> 
>> I would like to see more thorough justification for why this change
>> belongs in vt, and not in fbcon. That was my point about the 3-way
>> dependency.
> 
> Since it's really about the dependency between console_lock and s_active
> and fbcon is not involved in all code paths where we take these locks
> (like in the [2] case) but vt is.
> 
> --Imre
> 
>>
>> Regards,
>> Peter Hurley
>>
>>
>>> vfs_write()   for the above console bind sysfs entry
>>>   kernfs_fop_write()
>>>     kernfs_get_active() ->
>>>       takes s_active rwsem for the above sysfs entry
>>>     ...
>>>       store_bind() -> takes console_sem
>>>
>>> So you have console_sem->s_active ordering on one path and
>>> s_active->console_sem ordering on the other.
>>>
>>> This patch gets rid of the ordering problem and the related lockdep
>>> warning.
>>>
>>> --Imre
>>>
>>>>
>>>> Regards,
>>>> Peter Hurley
>>>>
>>>>> Fix this by unregistering the console driver from a deferred work, where
>>>>> we can safely drop the console lock while unregistering the device and
>>>>> corresponding sysfs entries (which in turn acquire s_active). Note that
>>>>> we have to keep the console driver slot in the registered_con_driver
>>>>> array reserved for the driver that's being unregistered until it's fully
>>>>> removed. Otherwise a concurrent call to do_register_con_driver could
>>>>> try to reuse the same slot and fail when registering the corresponding
>>>>> device with a minor index that's still in use.
>>>>>
>>>>> Reference: https://bugs.freedesktop.org/show_bug.cgi?id=70523
>>>>> Signed-off-by: Imre Deak <imre.deak@intel.com>
>>>>> ---
>>>>>  drivers/tty/vt/vt.c | 51 +++++++++++++++++++++++++++++++++++++++++----------
>>>>>  1 file changed, 41 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
>>>>> index 5dd1880..b9edc77 100644
>>>>> --- a/drivers/tty/vt/vt.c
>>>>> +++ b/drivers/tty/vt/vt.c
>>>>> @@ -108,6 +108,7 @@
>>>>>  #define CON_DRIVER_FLAG_MODULE 1
>>>>>  #define CON_DRIVER_FLAG_INIT   2
>>>>>  #define CON_DRIVER_FLAG_ATTR   4
>>>>> +#define CON_DRIVER_FLAG_ZOMBIE 8
>>>>>  
>>>>>  struct con_driver {
>>>>>  	const struct consw *con;
>>>>> @@ -153,6 +154,7 @@ static int set_vesa_blanking(char __user *p);
>>>>>  static void set_cursor(struct vc_data *vc);
>>>>>  static void hide_cursor(struct vc_data *vc);
>>>>>  static void console_callback(struct work_struct *ignored);
>>>>> +static void con_driver_unregister_callback(struct work_struct *ignored);
>>>>>  static void blank_screen_t(unsigned long dummy);
>>>>>  static void set_palette(struct vc_data *vc);
>>>>>  
>>>>> @@ -180,6 +182,7 @@ static int blankinterval = 10*60;
>>>>>  core_param(consoleblank, blankinterval, int, 0444);
>>>>>  
>>>>>  static DECLARE_WORK(console_work, console_callback);
>>>>> +static DECLARE_WORK(con_driver_unregister_work, con_driver_unregister_callback);
>>>>>  
>>>>>  /*
>>>>>   * fg_console is the current virtual console,
>>>>> @@ -3597,7 +3600,8 @@ static int do_register_con_driver(const struct consw *csw, int first, int last)
>>>>>  	for (i = 0; i < MAX_NR_CON_DRIVER; i++) {
>>>>>  		con_driver = &registered_con_driver[i];
>>>>>  
>>>>> -		if (con_driver->con == NULL) {
>>>>> +		if (con_driver->con == NULL &&
>>>>> +		    !(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE)) {
>>>>>  			con_driver->con = csw;
>>>>>  			con_driver->desc = desc;
>>>>>  			con_driver->node = i;
>>>>> @@ -3660,16 +3664,10 @@ int do_unregister_con_driver(const struct consw *csw)
>>>>>  
>>>>>  		if (con_driver->con == csw &&
>>>>>  		    con_driver->flag & CON_DRIVER_FLAG_MODULE) {
>>>>> -			vtconsole_deinit_device(con_driver);
>>>>> -			device_destroy(vtconsole_class,
>>>>> -				       MKDEV(0, con_driver->node));
>>>>>  			con_driver->con = NULL;
>>>>> -			con_driver->desc = NULL;
>>>>> -			con_driver->dev = NULL;
>>>>> -			con_driver->node = 0;
>>>>> -			con_driver->flag = 0;
>>>>> -			con_driver->first = 0;
>>>>> -			con_driver->last = 0;
>>>>> +			con_driver->flag = CON_DRIVER_FLAG_ZOMBIE;
>>>>> +			schedule_work(&con_driver_unregister_work);
>>>>> +
>>>>>  			return 0;
>>>>>  		}
>>>>>  	}
>>>>> @@ -3678,6 +3676,39 @@ int do_unregister_con_driver(const struct consw *csw)
>>>>>  }
>>>>>  EXPORT_SYMBOL_GPL(do_unregister_con_driver);
>>>>>  
>>>>> +static void con_driver_unregister_callback(struct work_struct *ignored)
>>>>> +{
>>>>> +	int i;
>>>>> +
>>>>> +	console_lock();
>>>>> +
>>>>> +	for (i = 0; i < MAX_NR_CON_DRIVER; i++) {
>>>>> +		struct con_driver *con_driver = &registered_con_driver[i];
>>>>> +
>>>>> +		if (!(con_driver->flag & CON_DRIVER_FLAG_ZOMBIE))
>>>>> +			continue;
>>>>> +
>>>>> +		console_unlock();
>>>>> +
>>>>> +		vtconsole_deinit_device(con_driver);
>>>>> +		device_destroy(vtconsole_class, MKDEV(0, con_driver->node));
>>>>> +
>>>>> +		if (WARN_ON_ONCE(con_driver->con))
>>>>> +			con_driver->con = NULL;
>>>>> +		con_driver->desc = NULL;
>>>>> +		con_driver->dev = NULL;
>>>>> +		con_driver->node = 0;
>>>>> +		WARN_ON_ONCE(con_driver->flag != CON_DRIVER_FLAG_ZOMBIE);
>>>>> +		con_driver->flag = 0;
>>>>> +		con_driver->first = 0;
>>>>> +		con_driver->last = 0;
>>>>> +
>>>>> +		console_lock();
>>>>> +	}
>>>>> +
>>>>> +	console_unlock();
>>>>> +}
>>>>> +
>>>>>  /*
>>>>>   *	If we support more console drivers, this function is used
>>>>>   *	when a driver wants to take over some existing consoles
>>>>>
>>>>
>>>
>>>
>>
> 
> 


  reply	other threads:[~2014-12-12 22:45 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-12 16:38 [PATCH 1/3] vt: fix check for system/busy console drivers when unregistering them Imre Deak
2014-12-12 16:38 ` [PATCH 2/3] vt: fix locking around vt_bind/vt_unbind Imre Deak
2014-12-12 17:53   ` Peter Hurley
2014-12-12 16:38 ` [PATCH 3/3] vt: fix console lock vs. kernfs s_active lock order Imre Deak
2014-12-12 18:32   ` Peter Hurley
2014-12-12 20:29     ` Imre Deak
2014-12-12 20:55       ` Imre Deak
2014-12-12 21:03       ` Peter Hurley
2014-12-12 22:03         ` Imre Deak
2014-12-12 22:45           ` Peter Hurley [this message]
2014-12-12 22:58             ` Imre Deak
2014-12-13 21:14   ` [PATCH v2 " Imre Deak
2014-12-13 23:07     ` [PATCH v3 " Imre Deak
2014-12-12 17:30 ` [PATCH 1/3] vt: fix check for system/busy console drivers when unregistering them Peter Hurley
2014-12-13 21:14 ` [PATCH v2 " Imre Deak
2014-12-15 15:05   ` Daniel Vetter
2014-12-15 16:08     ` Imre Deak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=548B7003.2000502@hurleysoftware.com \
    --to=peter@hurleysoftware.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=gregkh@linuxfoundation.org \
    --cc=imre.deak@intel.com \
    --cc=jslaby@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox