* [PATCH 4.16 REGRESSION fix 0/2] Bluetooth: Fix hci_bcm BT devices getting stuck in runtime-suspended status @ 2018-03-14 22:06 Hans de Goede 2018-03-14 22:06 ` [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" Hans de Goede 2018-03-14 22:06 ` [PATCH 4.16 REGRESSION fix 2/2] Bluetooth: hci_bcm: Set pulsed_host_wake flag in sleep parameters Hans de Goede 0 siblings, 2 replies; 15+ messages in thread From: Hans de Goede @ 2018-03-14 22:06 UTC (permalink / raw) To: Marcel Holtmann, Gustavo Padovan, Johan Hedberg Cc: Hans de Goede, Frédéric Danis, Lukas Wunner, linux-bluetooth, linux-serial, linux-acpi Hi All, See the individual commit messages for the what and why of this series only patch 1/2 is really necessary to fix the regression. The regression changes a theoretical race window in the hci_bcm IRQ handling into a real one and the first patch reverts the offending commit making the race window a theoretical thing again (we would need a 5 second stall in the right place to trigger it). The second patch in the series contains a simple fix closing the theoretical race window, this was tested with the regression still in place and it does successfully get things unstuck after hitting the race. I guess this can wait for 4.17, but I'm sending this out as a series as the patches were developed in tandem. This series was tested on an Asus T100CHI with a bcm43241b4 wifi/bt combo chip. Regards, Hans ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" 2018-03-14 22:06 [PATCH 4.16 REGRESSION fix 0/2] Bluetooth: Fix hci_bcm BT devices getting stuck in runtime-suspended status Hans de Goede @ 2018-03-14 22:06 ` Hans de Goede 2018-03-14 22:16 ` Lukas Wunner ` (2 more replies) 2018-03-14 22:06 ` [PATCH 4.16 REGRESSION fix 2/2] Bluetooth: hci_bcm: Set pulsed_host_wake flag in sleep parameters Hans de Goede 1 sibling, 3 replies; 15+ messages in thread From: Hans de Goede @ 2018-03-14 22:06 UTC (permalink / raw) To: Marcel Holtmann, Gustavo Padovan, Johan Hedberg Cc: Hans de Goede, Frédéric Danis, Lukas Wunner, linux-bluetooth, linux-serial, linux-acpi This reverts commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime PM code"). The commit msg for this commit states "No functional change intended.", but replacing: pm_runtime_get(); pm_runtime_mark_last_busy(); pm_runtime_put_autosuspend(); with: pm_request_resume(); Does result in a functional change, pm_request_resume() only calls pm_runtime_mark_last_busy() if the device was suspended before the call. This results in the following happening: 1) Device is runtime suspended 2) Device drives host_wake IRQ logically high as it starts receiving data 3) bcm_host_wake() gets called, causes the device to runtime-resume, current time gets marked as last_busy time 4) After 5 seconds the autosuspend timer expires and the dev autosuspends as no one has been calling pm_runtime_mark_last_busy(), the device was resumed during those 5 seconds, so all the pm_request_resume() calls while receiving data and/or bcm_host_wake() calls were nops 5) If 4) happens while the device has (just received) data in its buffer to be read by the host the IRQ line is *already* / still logically high when we autosuspend and since we use an edge triggered IRQ, the IRQ will never trigger, causing the device to get stuck in suspend Therefor this commit has to be reverted, so that we avoid the device getting stuck in suspend. Cc: Lukas Wunner <lukas@wunner.de> Signed-off-by: Hans de Goede <hdegoede@redhat.com> --- drivers/bluetooth/hci_bcm.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/bluetooth/hci_bcm.c b/drivers/bluetooth/hci_bcm.c index 59804a35488c..b089012a49f9 100644 --- a/drivers/bluetooth/hci_bcm.c +++ b/drivers/bluetooth/hci_bcm.c @@ -248,7 +248,9 @@ static irqreturn_t bcm_host_wake(int irq, void *data) bt_dev_dbg(bdev, "Host wake IRQ"); - pm_request_resume(bdev->dev); + pm_runtime_get(bdev->dev); + pm_runtime_mark_last_busy(bdev->dev); + pm_runtime_put_autosuspend(bdev->dev); return IRQ_HANDLED; } @@ -590,8 +592,11 @@ static int bcm_recv(struct hci_uart *hu, const void *data, int count) } else if (!bcm->rx_skb) { /* Delay auto-suspend when receiving completed packet */ mutex_lock(&bcm_device_lock); - if (bcm->dev && bcm_device_exists(bcm->dev)) - pm_request_resume(bcm->dev->dev); + if (bcm->dev && bcm_device_exists(bcm->dev)) { + pm_runtime_get(bcm->dev->dev); + pm_runtime_mark_last_busy(bcm->dev->dev); + pm_runtime_put_autosuspend(bcm->dev->dev); + } mutex_unlock(&bcm_device_lock); } -- 2.14.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" 2018-03-14 22:06 ` [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" Hans de Goede @ 2018-03-14 22:16 ` Lukas Wunner 2018-03-14 22:23 ` Hans de Goede 2018-03-15 8:32 ` Lukas Wunner 2018-03-15 18:40 ` Marcel Holtmann 2 siblings, 1 reply; 15+ messages in thread From: Lukas Wunner @ 2018-03-14 22:16 UTC (permalink / raw) To: Hans de Goede Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, Frédéric Danis, linux-bluetooth, linux-serial, linux-acpi, Robert R. Howell On Wed, Mar 14, 2018 at 11:06:02PM +0100, Hans de Goede wrote: > This reverts commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime > PM code"). The commit msg for this commit states "No functional change > intended.", but replacing: > > pm_runtime_get(); > pm_runtime_mark_last_busy(); > pm_runtime_put_autosuspend(); > > with: > > pm_request_resume(); > > Does result in a functional change, pm_request_resume() only calls > pm_runtime_mark_last_busy() if the device was suspended before the call. Yes, Robert Howell (cc) reported this a few days ago: https://bugzilla.kernel.org/show_bug.cgi?id=198953 I've worked with him to develop a fix which is better IMHO than a revert, namely he's replacing the pm_request_resume() in bcm_recv() with pm_runtime_mark_last_busy(), and the pm_request_resume() in the interrupt handler can stay. He says that fixes the issue for him. I hope he'll submit the patch shortly. Thanks, Lukas ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" 2018-03-14 22:16 ` Lukas Wunner @ 2018-03-14 22:23 ` Hans de Goede 2018-03-14 22:38 ` Lukas Wunner 0 siblings, 1 reply; 15+ messages in thread From: Hans de Goede @ 2018-03-14 22:23 UTC (permalink / raw) To: Lukas Wunner Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, Frédéric Danis, linux-bluetooth, linux-serial, linux-acpi, Robert R. Howell Hi, On 14-03-18 23:16, Lukas Wunner wrote: > On Wed, Mar 14, 2018 at 11:06:02PM +0100, Hans de Goede wrote: >> This reverts commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime >> PM code"). The commit msg for this commit states "No functional change >> intended.", but replacing: >> >> pm_runtime_get(); >> pm_runtime_mark_last_busy(); >> pm_runtime_put_autosuspend(); >> >> with: >> >> pm_request_resume(); >> >> Does result in a functional change, pm_request_resume() only calls >> pm_runtime_mark_last_busy() if the device was suspended before the call. > > Yes, Robert Howell (cc) reported this a few days ago: > https://bugzilla.kernel.org/show_bug.cgi?id=198953 > > I've worked with him to develop a fix which is better IMHO than a revert, > namely he's replacing the pm_request_resume() in bcm_recv() with > pm_runtime_mark_last_busy(), and the pm_request_resume() in the interrupt > handler can stay. He says that fixes the issue for him. It makes the race window a lot smaller, but it still leaves a race: 1) some data comes in, gets full read from the device 2) 4.9999 seconds elapse since last byte has been read 3) new data comes in, triggers IRQ, IRQ does nothing because runtime suspend has not yet kicked in 4) runtime suspend kicks in, disabling the uart before the first new byte is received 5) stuck again > I hope he'll submit the patch shortly. We're quite far into the cycle already and this is a serious regression, also nothing of great value is lost by the revert, the original commit was a minor cleanup which turns out to have bad side-effects, a simple revert really is the best solution here, esp. in this point of the cycle. Regards, Hans ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" 2018-03-14 22:23 ` Hans de Goede @ 2018-03-14 22:38 ` Lukas Wunner 2018-03-15 7:49 ` Hans de Goede 0 siblings, 1 reply; 15+ messages in thread From: Lukas Wunner @ 2018-03-14 22:38 UTC (permalink / raw) To: Hans de Goede Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, Frédéric Danis, linux-bluetooth, linux-serial, linux-acpi, Robert R. Howell On Wed, Mar 14, 2018 at 11:23:12PM +0100, Hans de Goede wrote: > On 14-03-18 23:16, Lukas Wunner wrote: > >On Wed, Mar 14, 2018 at 11:06:02PM +0100, Hans de Goede wrote: > >>This reverts commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime > >>PM code"). The commit msg for this commit states "No functional change > >>intended.", but replacing: > >> > >> pm_runtime_get(); > >> pm_runtime_mark_last_busy(); > >> pm_runtime_put_autosuspend(); > >> > >>with: > >> > >> pm_request_resume(); > >> > >>Does result in a functional change, pm_request_resume() only calls > >>pm_runtime_mark_last_busy() if the device was suspended before the call. > > > >Yes, Robert Howell (cc) reported this a few days ago: > >https://bugzilla.kernel.org/show_bug.cgi?id=198953 > > > >I've worked with him to develop a fix which is better IMHO than a revert, > >namely he's replacing the pm_request_resume() in bcm_recv() with > >pm_runtime_mark_last_busy(), and the pm_request_resume() in the interrupt > >handler can stay. He says that fixes the issue for him. > > It makes the race window a lot smaller, but it still leaves a race: > > 1) some data comes in, gets full read from the device > 2) 4.9999 seconds elapse since last byte has been read > 3) new data comes in, triggers IRQ, IRQ does nothing because runtime suspend > has not yet kicked in > 4) runtime suspend kicks in, disabling the uart before the first new byte is received > 5) stuck again Hm okay, but a call to pm_runtime_mark_last_busy() before the pm_request_resume() should avoid that. Actually I'm wondering why we're not calling pm_runtime_mark_last_busy() in rpm_resume() if the device was already resumed as clearly an action is requested from it. That needs to be investigated separately. > >I hope he'll submit the patch shortly. > > We're quite far into the cycle already and this is a serious regression, > also nothing of great value is lost by the revert, the original commit > was a minor cleanup which turns out to have bad side-effects, a simple > revert really is the best solution here, esp. in this point of the cycle. Just an hour ago he sent me the patch to look over it. And we're at least two and a half weeks away from v4.16. Thanks, Lukas ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" 2018-03-14 22:38 ` Lukas Wunner @ 2018-03-15 7:49 ` Hans de Goede 2018-03-15 8:14 ` Lukas Wunner 0 siblings, 1 reply; 15+ messages in thread From: Hans de Goede @ 2018-03-15 7:49 UTC (permalink / raw) To: Lukas Wunner Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, Frédéric Danis, linux-bluetooth, linux-serial, linux-acpi, Robert R. Howell Hi, On 14-03-18 23:38, Lukas Wunner wrote: > On Wed, Mar 14, 2018 at 11:23:12PM +0100, Hans de Goede wrote: >> On 14-03-18 23:16, Lukas Wunner wrote: >>> On Wed, Mar 14, 2018 at 11:06:02PM +0100, Hans de Goede wrote: >>>> This reverts commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime >>>> PM code"). The commit msg for this commit states "No functional change >>>> intended.", but replacing: >>>> >>>> pm_runtime_get(); >>>> pm_runtime_mark_last_busy(); >>>> pm_runtime_put_autosuspend(); >>>> >>>> with: >>>> >>>> pm_request_resume(); >>>> >>>> Does result in a functional change, pm_request_resume() only calls >>>> pm_runtime_mark_last_busy() if the device was suspended before the call. >>> >>> Yes, Robert Howell (cc) reported this a few days ago: >>> https://bugzilla.kernel.org/show_bug.cgi?id=198953 >>> >>> I've worked with him to develop a fix which is better IMHO than a revert, >>> namely he's replacing the pm_request_resume() in bcm_recv() with >>> pm_runtime_mark_last_busy(), and the pm_request_resume() in the interrupt >>> handler can stay. He says that fixes the issue for him. >> >> It makes the race window a lot smaller, but it still leaves a race: >> >> 1) some data comes in, gets full read from the device >> 2) 4.9999 seconds elapse since last byte has been read >> 3) new data comes in, triggers IRQ, IRQ does nothing because runtime suspend >> has not yet kicked in >> 4) runtime suspend kicks in, disabling the uart before the first new byte is received >> 5) stuck again > > Hm okay, but a call to pm_runtime_mark_last_busy() before the > pm_request_resume() should avoid that. Actually I'm wondering > why we're not calling pm_runtime_mark_last_busy() in rpm_resume() > if the device was already resumed as clearly an action is requested > from it. That needs to be investigated separately. > >>> I hope he'll submit the patch shortly. >> >> We're quite far into the cycle already and this is a serious regression, >> also nothing of great value is lost by the revert, the original commit >> was a minor cleanup which turns out to have bad side-effects, a simple >> revert really is the best solution here, esp. in this point of the cycle. > > Just an hour ago he sent me the patch to look over it. And we're at > least two and a half weeks away from v4.16. No we are *only* two and a half weeks away from v4.16 (worst case scenario) and Linus does not like getting last minute fixes. I really so no good reason to not fix this with a simple revert, esp. since as my explanation of the race condition in the fix he send you shows, getting this right is non trivial. Falling back to the code before the troublesome commit gives us a known working solution, at 0 cost (as the reverted commit was just a code cleanup, no functionality is lost). Anyways this is Marcel's call now. Regards, Hans ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" 2018-03-15 7:49 ` Hans de Goede @ 2018-03-15 8:14 ` Lukas Wunner 2018-03-15 10:23 ` Hans de Goede 2018-03-15 13:15 ` Marcel Holtmann 0 siblings, 2 replies; 15+ messages in thread From: Lukas Wunner @ 2018-03-15 8:14 UTC (permalink / raw) To: Hans de Goede Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, Frédéric Danis, linux-bluetooth, linux-serial, linux-acpi, Robert R. Howell On Thu, Mar 15, 2018 at 08:49:04AM +0100, Hans de Goede wrote: > On 14-03-18 23:38, Lukas Wunner wrote: > > On Wed, Mar 14, 2018 at 11:23:12PM +0100, Hans de Goede wrote: > > >We're quite far into the cycle already and this is a serious regression, > > >also nothing of great value is lost by the revert, the original commit > > >was a minor cleanup which turns out to have bad side-effects, a simple > > >revert really is the best solution here, esp. in this point of the cycle. > > > > Just an hour ago he sent me the patch to look over it. And we're at > > least two and a half weeks away from v4.16. > > No we are *only* two and a half weeks away from v4.16 (worst case scenario) > and Linus does not like getting last minute fixes. That doesn't preclude allowing a few hours to discuss things. There is never such a rush. In the present case, a new contributor was willing to debug the issue and submit a patch. Onboarding new contributors is important and IMO it's worth waiting a few days for them to sort things out, even if it means a regression stays present a little longer. I'm sorry that it meant you wasted time debugging it in parallel. That said, when submitting the patch I clearly failed to notice that for devices using autosuspend, pm_request_resume() doesn't update the last usage timestamp. While that could be fixed by calling pm_runtime_mark_last_busy() before pm_request_resume(), it doesn't seem to be customary as a look at all the call sites of pm_request_resume() shows. The original three-line sequence, although quite verbose, appears to be what is commonly used in such a case. For this reason reverting back to the original version seems justified. Thanks, Lukas ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" 2018-03-15 8:14 ` Lukas Wunner @ 2018-03-15 10:23 ` Hans de Goede 2018-03-15 13:15 ` Marcel Holtmann 1 sibling, 0 replies; 15+ messages in thread From: Hans de Goede @ 2018-03-15 10:23 UTC (permalink / raw) To: Lukas Wunner Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, Frédéric Danis, linux-bluetooth, linux-serial, linux-acpi, Robert R. Howell Hi, On 15-03-18 09:14, Lukas Wunner wrote: > On Thu, Mar 15, 2018 at 08:49:04AM +0100, Hans de Goede wrote: >> On 14-03-18 23:38, Lukas Wunner wrote: >>> On Wed, Mar 14, 2018 at 11:23:12PM +0100, Hans de Goede wrote: >>>> We're quite far into the cycle already and this is a serious regression, >>>> also nothing of great value is lost by the revert, the original commit >>>> was a minor cleanup which turns out to have bad side-effects, a simple >>>> revert really is the best solution here, esp. in this point of the cycle. >>> >>> Just an hour ago he sent me the patch to look over it. And we're at >>> least two and a half weeks away from v4.16. >> >> No we are *only* two and a half weeks away from v4.16 (worst case scenario) >> and Linus does not like getting last minute fixes. > > That doesn't preclude allowing a few hours to discuss things. > There is never such a rush. In the present case, a new contributor > was willing to debug the issue and submit a patch. Onboarding new > contributors is important and IMO it's worth waiting a few days for > them to sort things out, even if it means a regression stays present > a little longer. I completely agree that onboarding new contributors is important. Regards, Hans ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" 2018-03-15 8:14 ` Lukas Wunner 2018-03-15 10:23 ` Hans de Goede @ 2018-03-15 13:15 ` Marcel Holtmann 2018-03-15 13:49 ` Hans de Goede 1 sibling, 1 reply; 15+ messages in thread From: Marcel Holtmann @ 2018-03-15 13:15 UTC (permalink / raw) To: Lukas Wunner Cc: Hans de Goede, Gustavo F. Padovan, Johan Hedberg, Frédéric Danis, Linux Bluetooth mailing list, linux-serial, linux-acpi, Robert R. Howell Hi Lukas, >>>> We're quite far into the cycle already and this is a serious regression, >>>> also nothing of great value is lost by the revert, the original commit >>>> was a minor cleanup which turns out to have bad side-effects, a simple >>>> revert really is the best solution here, esp. in this point of the cycle. >>> >>> Just an hour ago he sent me the patch to look over it. And we're at >>> least two and a half weeks away from v4.16. >> >> No we are *only* two and a half weeks away from v4.16 (worst case scenario) >> and Linus does not like getting last minute fixes. > > That doesn't preclude allowing a few hours to discuss things. > There is never such a rush. In the present case, a new contributor > was willing to debug the issue and submit a patch. Onboarding new > contributors is important and IMO it's worth waiting a few days for > them to sort things out, even if it means a regression stays present > a little longer. I'm sorry that it meant you wasted time debugging > it in parallel. > > That said, when submitting the patch I clearly failed to notice that > for devices using autosuspend, pm_request_resume() doesn't update > the last usage timestamp. While that could be fixed by calling > pm_runtime_mark_last_busy() before pm_request_resume(), it doesn't > seem to be customary as a look at all the call sites of > pm_request_resume() shows. The original three-line sequence, > although quite verbose, appears to be what is commonly used in such > a case. For this reason reverting back to the original version > seems justified. there is no reason to rush this through. With a properly worded commit message that explain the reason, I have no problem to do a last minute -rc inclusion. However what I like to have is a single patch with all Acks and also CC: stable tags if required that we can just send off in the direction towards Linus. I saw there is an alternative patch on the mailing list. So I would like to have a good conclusion on what goes to -rc and that we all agree. Regards Marcel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" 2018-03-15 13:15 ` Marcel Holtmann @ 2018-03-15 13:49 ` Hans de Goede 0 siblings, 0 replies; 15+ messages in thread From: Hans de Goede @ 2018-03-15 13:49 UTC (permalink / raw) To: Marcel Holtmann, Lukas Wunner Cc: Gustavo F. Padovan, Johan Hedberg, Frédéric Danis, Linux Bluetooth mailing list, linux-serial, linux-acpi, Robert R. Howell Hi, On 15-03-18 14:15, Marcel Holtmann wrote: > Hi Lukas, > >>>>> We're quite far into the cycle already and this is a serious regression, >>>>> also nothing of great value is lost by the revert, the original commit >>>>> was a minor cleanup which turns out to have bad side-effects, a simple >>>>> revert really is the best solution here, esp. in this point of the cycle. >>>> >>>> Just an hour ago he sent me the patch to look over it. And we're at >>>> least two and a half weeks away from v4.16. >>> >>> No we are *only* two and a half weeks away from v4.16 (worst case scenario) >>> and Linus does not like getting last minute fixes. >> >> That doesn't preclude allowing a few hours to discuss things. >> There is never such a rush. In the present case, a new contributor >> was willing to debug the issue and submit a patch. Onboarding new >> contributors is important and IMO it's worth waiting a few days for >> them to sort things out, even if it means a regression stays present >> a little longer. I'm sorry that it meant you wasted time debugging >> it in parallel. >> >> That said, when submitting the patch I clearly failed to notice that >> for devices using autosuspend, pm_request_resume() doesn't update >> the last usage timestamp. While that could be fixed by calling >> pm_runtime_mark_last_busy() before pm_request_resume(), it doesn't >> seem to be customary as a look at all the call sites of >> pm_request_resume() shows. The original three-line sequence, >> although quite verbose, appears to be what is commonly used in such >> a case. For this reason reverting back to the original version >> seems justified. > > there is no reason to rush this through. With a properly worded commit message that explain the reason, I have no problem to do a last minute -rc inclusion. > > However what I like to have is a single patch with all Acks and also CC: stable tags if required that we can just send off in the direction towards Linus. I saw there is an alternative patch on the mailing list. So I would like to have a good conclusion on what goes to -rc and that we all agree. I think we are all in agreement now to just go with the revert patch which I did, Lucas acked that patch this morning. Regards, Hans ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" 2018-03-14 22:06 ` [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" Hans de Goede 2018-03-14 22:16 ` Lukas Wunner @ 2018-03-15 8:32 ` Lukas Wunner 2018-03-15 18:40 ` Marcel Holtmann 2 siblings, 0 replies; 15+ messages in thread From: Lukas Wunner @ 2018-03-15 8:32 UTC (permalink / raw) To: Hans de Goede Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, Frédéric Danis, linux-bluetooth, linux-serial, linux-acpi On Wed, Mar 14, 2018 at 11:06:02PM +0100, Hans de Goede wrote: > This reverts commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime > PM code"). The commit msg for this commit states "No functional change > intended.", but replacing: > > pm_runtime_get(); > pm_runtime_mark_last_busy(); > pm_runtime_put_autosuspend(); > > with: > > pm_request_resume(); > > Does result in a functional change, pm_request_resume() only calls > pm_runtime_mark_last_busy() if the device was suspended before the call. > > This results in the following happening: > > 1) Device is runtime suspended > 2) Device drives host_wake IRQ logically high as it starts receiving data > 3) bcm_host_wake() gets called, causes the device to runtime-resume, > current time gets marked as last_busy time > 4) After 5 seconds the autosuspend timer expires and the dev autosuspends > as no one has been calling pm_runtime_mark_last_busy(), the device was > resumed during those 5 seconds, so all the pm_request_resume() calls > while receiving data and/or bcm_host_wake() calls were nops > 5) If 4) happens while the device has (just received) data in its buffer to > be read by the host the IRQ line is *already* / still logically high > when we autosuspend and since we use an edge triggered IRQ, the IRQ > will never trigger, causing the device to get stuck in suspend > > Therefor this commit has to be reverted, so that we avoid the device > getting stuck in suspend. > > Cc: Lukas Wunner <lukas@wunner.de> > Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Lukas Wunner <lukas@wunner.de> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" 2018-03-14 22:06 ` [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" Hans de Goede 2018-03-14 22:16 ` Lukas Wunner 2018-03-15 8:32 ` Lukas Wunner @ 2018-03-15 18:40 ` Marcel Holtmann 2 siblings, 0 replies; 15+ messages in thread From: Marcel Holtmann @ 2018-03-15 18:40 UTC (permalink / raw) To: Hans de Goede Cc: Gustavo F. Padovan, Johan Hedberg, Frédéric Danis, Lukas Wunner, linux-bluetooth, linux-serial, linux-acpi Hi Hans, > This reverts commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime > PM code"). The commit msg for this commit states "No functional change > intended.", but replacing: > > pm_runtime_get(); > pm_runtime_mark_last_busy(); > pm_runtime_put_autosuspend(); > > with: > > pm_request_resume(); > > Does result in a functional change, pm_request_resume() only calls > pm_runtime_mark_last_busy() if the device was suspended before the call. > > This results in the following happening: > > 1) Device is runtime suspended > 2) Device drives host_wake IRQ logically high as it starts receiving data > 3) bcm_host_wake() gets called, causes the device to runtime-resume, > current time gets marked as last_busy time > 4) After 5 seconds the autosuspend timer expires and the dev autosuspends > as no one has been calling pm_runtime_mark_last_busy(), the device was > resumed during those 5 seconds, so all the pm_request_resume() calls > while receiving data and/or bcm_host_wake() calls were nops > 5) If 4) happens while the device has (just received) data in its buffer to > be read by the host the IRQ line is *already* / still logically high > when we autosuspend and since we use an edge triggered IRQ, the IRQ > will never trigger, causing the device to get stuck in suspend > > Therefor this commit has to be reverted, so that we avoid the device > getting stuck in suspend. > > Cc: Lukas Wunner <lukas@wunner.de> > Signed-off-by: Hans de Goede <hdegoede@redhat.com> > --- > drivers/bluetooth/hci_bcm.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) patch has been applied to bluetooth-stable tree. Regards Marcel ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 4.16 REGRESSION fix 2/2] Bluetooth: hci_bcm: Set pulsed_host_wake flag in sleep parameters 2018-03-14 22:06 [PATCH 4.16 REGRESSION fix 0/2] Bluetooth: Fix hci_bcm BT devices getting stuck in runtime-suspended status Hans de Goede 2018-03-14 22:06 ` [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" Hans de Goede @ 2018-03-14 22:06 ` Hans de Goede 2018-03-14 22:22 ` Lukas Wunner 2018-03-15 18:40 ` Marcel Holtmann 1 sibling, 2 replies; 15+ messages in thread From: Hans de Goede @ 2018-03-14 22:06 UTC (permalink / raw) To: Marcel Holtmann, Gustavo Padovan, Johan Hedberg Cc: Hans de Goede, Frédéric Danis, Lukas Wunner, linux-bluetooth, linux-serial, linux-acpi The IRQ output of the bcm bt-device is really a level IRQ signal, which signals a logical high as long as the device's buffer contains data. Since the draining in the buffer is done in the tty driver, we cannot (easily) wait in a threaded interrupt handler for the draining, after which the IRQ should go low again. So instead we treat the IRQ as an edge interrupt. This opens the window for a theoretical race where we wakeup, read some data and then autosuspend *before* the IRQ has gone (logical) low, followed by the device just at that moment receiving more data, causing the IRQ to stay high and we never see an edge. Since we call pm_runtime_mark_last_busy() on every received byte, there should be plenty time for the IRQ to go (logical) low before we ever suspend, so this should never happen, but after commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime PM code"), which has been reverted since, this was actually happening causing the device to get stuck in runtime suspend. The bcm bt-device actually has a workaround for this, if we set the pulsed_host_wake flag in the sleep parameters, then the device monitors if the host is draining the buffer and if not then after a timeout the device will pulse the IRQ line, causing us to see an edge, fixing the stuck in suspend condition. This commit sets the pulsed_host_wake flag to fix the (mostly theoretical) race caused by us treating the IRQ as an edge IRQ. Signed-off-by: Hans de Goede <hdegoede@redhat.com> --- drivers/bluetooth/hci_bcm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/bluetooth/hci_bcm.c b/drivers/bluetooth/hci_bcm.c index b089012a49f9..04b9a6c75c75 100644 --- a/drivers/bluetooth/hci_bcm.c +++ b/drivers/bluetooth/hci_bcm.c @@ -307,7 +307,7 @@ static const struct bcm_set_sleep_mode default_sleep_params = { .usb_auto_sleep = 0, .usb_resume_timeout = 0, .break_to_host = 0, - .pulsed_host_wake = 0, + .pulsed_host_wake = 1, }; static int bcm_setup_sleep(struct hci_uart *hu) -- 2.14.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 4.16 REGRESSION fix 2/2] Bluetooth: hci_bcm: Set pulsed_host_wake flag in sleep parameters 2018-03-14 22:06 ` [PATCH 4.16 REGRESSION fix 2/2] Bluetooth: hci_bcm: Set pulsed_host_wake flag in sleep parameters Hans de Goede @ 2018-03-14 22:22 ` Lukas Wunner 2018-03-15 18:40 ` Marcel Holtmann 1 sibling, 0 replies; 15+ messages in thread From: Lukas Wunner @ 2018-03-14 22:22 UTC (permalink / raw) To: Hans de Goede Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, Frédéric Danis, linux-bluetooth, linux-serial, linux-acpi On Wed, Mar 14, 2018 at 11:06:03PM +0100, Hans de Goede wrote: > The IRQ output of the bcm bt-device is really a level IRQ signal, which > signals a logical high as long as the device's buffer contains data. Since > the draining in the buffer is done in the tty driver, we cannot (easily) > wait in a threaded interrupt handler for the draining, after which the > IRQ should go low again. > > So instead we treat the IRQ as an edge interrupt. This opens the window > for a theoretical race where we wakeup, read some data and then autosuspend > *before* the IRQ has gone (logical) low, followed by the device just at > that moment receiving more data, causing the IRQ to stay high and we never > see an edge. > > Since we call pm_runtime_mark_last_busy() on every received byte, there > should be plenty time for the IRQ to go (logical) low before we ever > suspend, so this should never happen, but after commit 43fff7683468 > ("Bluetooth: hci_bcm: Streamline runtime PM code"), which has been reverted > since, this was actually happening causing the device to get stuck in > runtime suspend. > > The bcm bt-device actually has a workaround for this, if we set the > pulsed_host_wake flag in the sleep parameters, then the device monitors > if the host is draining the buffer and if not then after a timeout the > device will pulse the IRQ line, causing us to see an edge, fixing the > stuck in suspend condition. > > This commit sets the pulsed_host_wake flag to fix the (mostly theoretical) > race caused by us treating the IRQ as an edge IRQ. > > Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Good work, thanks. > --- > drivers/bluetooth/hci_bcm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/bluetooth/hci_bcm.c b/drivers/bluetooth/hci_bcm.c > index b089012a49f9..04b9a6c75c75 100644 > --- a/drivers/bluetooth/hci_bcm.c > +++ b/drivers/bluetooth/hci_bcm.c > @@ -307,7 +307,7 @@ static const struct bcm_set_sleep_mode default_sleep_params = { > .usb_auto_sleep = 0, > .usb_resume_timeout = 0, > .break_to_host = 0, > - .pulsed_host_wake = 0, > + .pulsed_host_wake = 1, > }; > > static int bcm_setup_sleep(struct hci_uart *hu) > -- > 2.14.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4.16 REGRESSION fix 2/2] Bluetooth: hci_bcm: Set pulsed_host_wake flag in sleep parameters 2018-03-14 22:06 ` [PATCH 4.16 REGRESSION fix 2/2] Bluetooth: hci_bcm: Set pulsed_host_wake flag in sleep parameters Hans de Goede 2018-03-14 22:22 ` Lukas Wunner @ 2018-03-15 18:40 ` Marcel Holtmann 1 sibling, 0 replies; 15+ messages in thread From: Marcel Holtmann @ 2018-03-15 18:40 UTC (permalink / raw) To: Hans de Goede Cc: Gustavo F. Padovan, Johan Hedberg, Frédéric Danis, Lukas Wunner, linux-bluetooth, linux-serial, linux-acpi Hi Hans, > The IRQ output of the bcm bt-device is really a level IRQ signal, which > signals a logical high as long as the device's buffer contains data. Since > the draining in the buffer is done in the tty driver, we cannot (easily) > wait in a threaded interrupt handler for the draining, after which the > IRQ should go low again. > > So instead we treat the IRQ as an edge interrupt. This opens the window > for a theoretical race where we wakeup, read some data and then autosuspend > *before* the IRQ has gone (logical) low, followed by the device just at > that moment receiving more data, causing the IRQ to stay high and we never > see an edge. > > Since we call pm_runtime_mark_last_busy() on every received byte, there > should be plenty time for the IRQ to go (logical) low before we ever > suspend, so this should never happen, but after commit 43fff7683468 > ("Bluetooth: hci_bcm: Streamline runtime PM code"), which has been reverted > since, this was actually happening causing the device to get stuck in > runtime suspend. > > The bcm bt-device actually has a workaround for this, if we set the > pulsed_host_wake flag in the sleep parameters, then the device monitors > if the host is draining the buffer and if not then after a timeout the > device will pulse the IRQ line, causing us to see an edge, fixing the > stuck in suspend condition. > > This commit sets the pulsed_host_wake flag to fix the (mostly theoretical) > race caused by us treating the IRQ as an edge IRQ. > > Signed-off-by: Hans de Goede <hdegoede@redhat.com> > --- > drivers/bluetooth/hci_bcm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) patch has been applied to bluetooth-stable tree. Regards Marcel ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2018-03-15 18:40 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-03-14 22:06 [PATCH 4.16 REGRESSION fix 0/2] Bluetooth: Fix hci_bcm BT devices getting stuck in runtime-suspended status Hans de Goede 2018-03-14 22:06 ` [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code" Hans de Goede 2018-03-14 22:16 ` Lukas Wunner 2018-03-14 22:23 ` Hans de Goede 2018-03-14 22:38 ` Lukas Wunner 2018-03-15 7:49 ` Hans de Goede 2018-03-15 8:14 ` Lukas Wunner 2018-03-15 10:23 ` Hans de Goede 2018-03-15 13:15 ` Marcel Holtmann 2018-03-15 13:49 ` Hans de Goede 2018-03-15 8:32 ` Lukas Wunner 2018-03-15 18:40 ` Marcel Holtmann 2018-03-14 22:06 ` [PATCH 4.16 REGRESSION fix 2/2] Bluetooth: hci_bcm: Set pulsed_host_wake flag in sleep parameters Hans de Goede 2018-03-14 22:22 ` Lukas Wunner 2018-03-15 18:40 ` Marcel Holtmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox