stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Luis R. Rodriguez" <mcgrof@kernel.org>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Gabriel C <nix.or.die@gmail.com>,
	"Luis R. Rodriguez" <mcgrof@kernel.org>,
	Arend Van Spriel <arend.vanspriel@broadcom.com>
Subject: Re: [PATCH 4.13 20/27] Revert "firmware: add sanity check on shutdown/suspend"
Date: Tue, 12 Sep 2017 19:20:08 +0200	[thread overview]
Message-ID: <20170912172008.GF16216@wotan.suse.de> (raw)
In-Reply-To: <20170912165310.407586301@linuxfoundation.org>

On Tue, Sep 12, 2017 at 10:00:00AM -0700, Greg Kroah-Hartman wrote:
> 4.13-stable review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Linus Torvalds <torvalds@linux-foundation.org>
> 
> commit f007cad159e99fa2acd3b2e9364fbb32ad28b971 upstream.
> 
> This reverts commit 81f95076281fdd3bc382e004ba1bce8e82fccbce.

I'm not convinced reverting this commit is the right thing to do at
this point given it would seem other errors were happening prior to
v4.13 and this commit would rather just bring to light the core of
the issue which needs to be addressed.

If reverting this commit please consider reverting also commit
06a45a93e7d34a ("firmware: move umh try locks into the umh code").

> It causes random failures of firmware loading at resume time (well,
> random for me, it seems to be more reliable for others)

That randomness is expected, as well as the old issues without this
commit and commit 06a45a93e7d34a. By removing this commit though we
should also be introducing another vector of possible issues though,
and I'm afraid that it may be hard to predict when they were caused
by a race issue on resume.

> because the
> firmware disabling is not actually synchronous with any particular
> resume event, and at least the btusb driver that uses a workqueue to
> load the firmware at resume seems to occasionally hit the "firmware
> loading is disabled" logic because the firmware loader hasn't gotten the
> resume event yet.

It would seem that without these commits the issue was present also:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1356076

These commits just bring forward the issues closer to attention. They
have my attention now and I am looking at a clean way to address it
with Marcel.

> Some kind of sanity check for not trying to load firmware when it's not
> possible might be a good thing, but this commit was not it.
> 
> Greg seems to have silently suffered the same issue, and pointed to the
> likely culprit, and Gabriel C verified the revert fixed it for him too.

If testing against old behaviour the right way would be to also revert
commit 06a45a93e7d34a ("firmware: move umh try locks into the umh code").

  Luis

> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Pointed-at-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Tested-by: Gabriel C <nix.or.die@gmail.com>
> Cc: Luis R. Rodriguez <mcgrof@kernel.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> ---
>  Documentation/driver-api/firmware/request_firmware.rst |   11 -
>  drivers/base/firmware_class.c                          |   99 -----------------
>  2 files changed, 110 deletions(-)
> 
> --- a/Documentation/driver-api/firmware/request_firmware.rst
> +++ b/Documentation/driver-api/firmware/request_firmware.rst
> @@ -44,17 +44,6 @@ request_firmware_nowait
>  .. kernel-doc:: drivers/base/firmware_class.c
>     :functions: request_firmware_nowait
>  
> -Considerations for suspend and resume
> -=====================================
> -
> -During suspend and resume only the built-in firmware and the firmware cache
> -elements of the firmware API can be used. This is managed by fw_pm_notify().
> -
> -fw_pm_notify
> -------------
> -.. kernel-doc:: drivers/base/firmware_class.c
> -   :functions: fw_pm_notify
> -
>  request firmware API expected driver use
>  ========================================
>  
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -256,38 +256,6 @@ static int fw_cache_piggyback_on_request
>   * guarding for corner cases a global lock should be OK */
>  static DEFINE_MUTEX(fw_lock);
>  
> -static bool __enable_firmware = false;
> -
> -static void enable_firmware(void)
> -{
> -	mutex_lock(&fw_lock);
> -	__enable_firmware = true;
> -	mutex_unlock(&fw_lock);
> -}
> -
> -static void disable_firmware(void)
> -{
> -	mutex_lock(&fw_lock);
> -	__enable_firmware = false;
> -	mutex_unlock(&fw_lock);
> -}
> -
> -/*
> - * When disabled only the built-in firmware and the firmware cache will be
> - * used to look for firmware.
> - */
> -static bool firmware_enabled(void)
> -{
> -	bool enabled = false;
> -
> -	mutex_lock(&fw_lock);
> -	if (__enable_firmware)
> -		enabled = true;
> -	mutex_unlock(&fw_lock);
> -
> -	return enabled;
> -}
> -
>  static struct firmware_cache fw_cache;
>  
>  static struct firmware_buf *__allocate_fw_buf(const char *fw_name,
> @@ -1239,12 +1207,6 @@ _request_firmware(const struct firmware
>  	if (ret <= 0) /* error or already assigned */
>  		goto out;
>  
> -	if (!firmware_enabled()) {
> -		WARN(1, "firmware request while host is not available\n");
> -		ret = -EHOSTDOWN;
> -		goto out;
> -	}
> -
>  	ret = fw_get_filesystem_firmware(device, fw->priv);
>  	if (ret) {
>  		if (!(opt_flags & FW_OPT_NO_WARN))
> @@ -1755,62 +1717,6 @@ static void device_uncache_fw_images_del
>  			   msecs_to_jiffies(delay));
>  }
>  
> -/**
> - * fw_pm_notify - notifier for suspend/resume
> - * @notify_block: unused
> - * @mode: mode we are switching to
> - * @unused: unused
> - *
> - * Used to modify the firmware_class state as we move in between states.
> - * The firmware_class implements a firmware cache to enable device driver
> - * to fetch firmware upon resume before the root filesystem is ready. We
> - * disable API calls which do not use the built-in firmware or the firmware
> - * cache when we know these calls will not work.
> - *
> - * The inner logic behind all this is a bit complex so it is worth summarizing
> - * the kernel's own suspend/resume process with context and focus on how this
> - * can impact the firmware API.
> - *
> - * First a review on how we go to suspend::
> - *
> - *	pm_suspend() --> enter_state() -->
> - *	sys_sync()
> - *	suspend_prepare() -->
> - *		__pm_notifier_call_chain(PM_SUSPEND_PREPARE, ...);
> - *		suspend_freeze_processes() -->
> - *			freeze_processes() -->
> - *				__usermodehelper_set_disable_depth(UMH_DISABLED);
> - *				freeze all tasks ...
> - *			freeze_kernel_threads()
> - *	suspend_devices_and_enter() -->
> - *		dpm_suspend_start() -->
> - *				dpm_prepare()
> - *				dpm_suspend()
> - *		suspend_enter()  -->
> - *			platform_suspend_prepare()
> - *			dpm_suspend_late()
> - *			freeze_enter()
> - *			syscore_suspend()
> - *
> - * When we resume we bail out of a loop from suspend_devices_and_enter() and
> - * unwind back out to the caller enter_state() where we were before as follows::
> - *
> - * 	enter_state() -->
> - *	suspend_devices_and_enter() --> (bail from loop)
> - *		dpm_resume_end() -->
> - *			dpm_resume()
> - *			dpm_complete()
> - *	suspend_finish() -->
> - *		suspend_thaw_processes() -->
> - *			thaw_processes() -->
> - *				__usermodehelper_set_disable_depth(UMH_FREEZING);
> - *				thaw_workqueues();
> - *				thaw all processes ...
> - *				usermodehelper_enable();
> - *		pm_notifier_call_chain(PM_POST_SUSPEND);
> - *
> - * fw_pm_notify() works through pm_notifier_call_chain().
> - */
>  static int fw_pm_notify(struct notifier_block *notify_block,
>  			unsigned long mode, void *unused)
>  {
> @@ -1824,7 +1730,6 @@ static int fw_pm_notify(struct notifier_
>  		 */
>  		kill_pending_fw_fallback_reqs(true);
>  		device_cache_fw_images();
> -		disable_firmware();
>  		break;
>  
>  	case PM_POST_SUSPEND:
> @@ -1837,7 +1742,6 @@ static int fw_pm_notify(struct notifier_
>  		mutex_lock(&fw_lock);
>  		fw_cache.state = FW_LOADER_NO_CACHE;
>  		mutex_unlock(&fw_lock);
> -		enable_firmware();
>  
>  		device_uncache_fw_images_delay(10 * MSEC_PER_SEC);
>  		break;
> @@ -1886,7 +1790,6 @@ static void __init fw_cache_init(void)
>  static int fw_shutdown_notify(struct notifier_block *unused1,
>  			      unsigned long unused2, void *unused3)
>  {
> -	disable_firmware();
>  	/*
>  	 * Kill all pending fallback requests to avoid both stalling shutdown,
>  	 * and avoid a deadlock with the usermode_lock.
> @@ -1902,7 +1805,6 @@ static struct notifier_block fw_shutdown
>  
>  static int __init firmware_class_init(void)
>  {
> -	enable_firmware();
>  	fw_cache_init();
>  	register_reboot_notifier(&fw_shutdown_nb);
>  #ifdef CONFIG_FW_LOADER_USER_HELPER
> @@ -1914,7 +1816,6 @@ static int __init firmware_class_init(vo
>  
>  static void __exit firmware_class_exit(void)
>  {
> -	disable_firmware();
>  #ifdef CONFIG_PM_SLEEP
>  	unregister_syscore_ops(&fw_syscore_ops);
>  	unregister_pm_notifier(&fw_cache.pm_notify);
> 
> 
> 

-- 
Luis Rodriguez, SUSE LINUX GmbH
Maxfeldstrasse 5; D-90409 Nuernberg

  reply	other threads:[~2017-09-12 17:20 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-12 16:59 [PATCH 4.13 00/27] 4.13.2-stable review Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 02/27] mtd: nand: hynix: add support for 20nm NAND chips Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 03/27] mtd: nand: mxc: Fix mxc_v1 ooblayout Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 04/27] mtd: nand: qcom: fix read failure without complete bootchain Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 05/27] mtd: nand: qcom: fix config error for BCH Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 06/27] nvme-fabrics: generate spec-compliant UUID NQNs Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 07/27] btrfs: resume qgroup rescan on rw remount Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 08/27] rtlwifi: btcoexist: Fix breakage of ant_sel for rtl8723be Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 09/27] rtlwifi: btcoexist: Fix antenna selection code Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 10/27] radix-tree: must check __radix_tree_preload() return value Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 11/27] brcmfmac: feature check for multi-scheduled scan fails on bcm4345 devices Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 12/27] kselftests: timers: leap-a-day: Change default arguments to help test runs Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 13/27] selftests: timers: Fix run_destructive_tests target to handle skipped tests Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 14/27] selftests/x86/fsgsbase: Test selectors 1, 2, and 3 Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 15/27] mm: kvfree the swap cluster info if the swap file is unsatisfactory Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 16/27] mm/swapfile.c: fix swapon frontswap_map memory leak on error Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 17/27] mm/sparse.c: fix typo in online_mem_sections Greg Kroah-Hartman
2017-09-12 16:59 ` [PATCH 4.13 18/27] mm/memory.c: fix mem_cgroup_oom_disable() call missing Greg Kroah-Hartman
2017-09-12 17:00 ` [PATCH 4.13 20/27] Revert "firmware: add sanity check on shutdown/suspend" Greg Kroah-Hartman
2017-09-12 17:20   ` Luis R. Rodriguez [this message]
2017-09-13  0:47     ` Greg Kroah-Hartman
2017-09-13  1:22       ` Luis R. Rodriguez
2017-09-13  1:40         ` Greg Kroah-Hartman
2017-09-13  4:11       ` Linus Torvalds
2017-09-13 18:38         ` Luis R. Rodriguez
2017-09-13 19:30           ` Linus Torvalds
2017-09-13 21:44             ` Luis R. Rodriguez
2017-09-12 17:00 ` [PATCH 4.13 21/27] rt2800: fix TX_PIN_CFG setting for non MT7620 chips Greg Kroah-Hartman
2017-09-12 17:00 ` [PATCH 4.13 22/27] Bluetooth: Properly check L2CAP config option output buffer length Greg Kroah-Hartman
2017-09-12 17:00 ` [PATCH 4.13 23/27] ARM64: dts: marvell: armada-37xx: Fix GIC maintenance interrupt Greg Kroah-Hartman
2017-09-12 17:00 ` [PATCH 4.13 24/27] ARM: 8692/1: mm: abort uaccess retries upon fatal signal Greg Kroah-Hartman
2017-09-12 17:00 ` [PATCH 4.13 25/27] NFS: Fix 2 use after free issues in the I/O code Greg Kroah-Hartman
2017-09-12 17:00 ` [PATCH 4.13 26/27] NFS: Sync the correct byte range during synchronous writes Greg Kroah-Hartman
2017-09-12 17:00 ` [PATCH 4.13 27/27] NFSv4: Fix up mirror allocation Greg Kroah-Hartman
2017-09-13  0:13 ` [PATCH 4.13 00/27] 4.13.2-stable review Shuah Khan
2017-09-13  0:57   ` Greg Kroah-Hartman
     [not found] ` <59b8666e.e2a9df0a.378a2.e18c@mx.google.com>
2017-09-13  1:03   ` Greg Kroah-Hartman
2017-09-13 22:16     ` Kevin Hilman
2017-09-13 14:35 ` Guenter Roeck
2017-09-13 19:01   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170912172008.GF16216@wotan.suse.de \
    --to=mcgrof@kernel.org \
    --cc=arend.vanspriel@broadcom.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nix.or.die@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).