Intel-Wired-Lan Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Intel-wired-lan] [PATCH net v3] net: ethtool: do runtime PM outside RTNL
       [not found]   ` <e6f227ee701e1ee37e8f568b1310d240a2b8935a.camel@sipsolutions.net>
@ 2023-12-07 10:16     ` Przemek Kitszel
  2023-12-07 17:40       ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Przemek Kitszel @ 2023-12-07 10:16 UTC (permalink / raw)
  To: Johannes Berg, Jakub Kicinski
  Cc: netdev, Marc MERLIN, intel-wired-lan@lists.osuosl.org,
	Heiner Kallweit

On 12/6/23 17:46, Johannes Berg wrote:
> On Wed, 2023-12-06 at 08:44 -0800, Jakub Kicinski wrote:
>> On Wed,  6 Dec 2023 11:39:32 +0100 Johannes Berg wrote:
>>> As reported by Marc MERLIN, at least one driver (igc) wants or
>>> needs to acquire the RTNL inside suspend/resume ops, which can
>>> be called from here in ethtool if runtime PM is enabled.
>>>
>>> Allow this by doing runtime PM transitions without the RTNL
>>> held. For the ioctl to have the same operations order, this
>>> required reworking the code to separately check validity and
>>> do the operation. For the netlink code, this now has to do
>>> the runtime_pm_put a bit later.
>>
>> I was really, really hoping that this would serve as a motivation
>> for Intel to sort out the igb/igc implementation. The flow AFAICT
>> is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC
>> back down immediately (!?) then it schedules a link check from a work
>> queue, which opens it again (!?). It's a source of never ending bugs.
>>
> 
> Well, I work there, but ... WiFi something else entirely. Marc just got
> lucky I spotted an issue in the logs ;-)
> 
> I'll let you guys take it from here ...
> 
> johannes
> 

I have let know our igc TL, architect, and anybody that could be
interested via cc: IWL.
And I'm happy that this could be done at relaxed pace thanks to Johannes
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Intel-wired-lan] [PATCH net v3] net: ethtool: do runtime PM outside RTNL
  2023-12-07 10:16     ` [Intel-wired-lan] [PATCH net v3] net: ethtool: do runtime PM outside RTNL Przemek Kitszel
@ 2023-12-07 17:40       ` Jakub Kicinski
  2023-12-11  4:52         ` Marc MERLIN
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2023-12-07 17:40 UTC (permalink / raw)
  To: Przemek Kitszel
  Cc: netdev, Johannes Berg, Marc MERLIN,
	intel-wired-lan@lists.osuosl.org, Heiner Kallweit

On Thu, 7 Dec 2023 11:16:10 +0100 Przemek Kitszel wrote:
> I have let know our igc TL, architect, and anybody that could be
> interested via cc: IWL. And I'm happy that this could be done at
> relaxed pace thanks to Johannes

I think you may be expecting us to take Johannes's patch.
It's still on the table, but to make things clear -
upstream we prefer to wait for the "real fix", so if we agree
that fixing igb/igc is a better way (as Heiner pointed out on previous
version PM functions are called by the stack under rtnl elsewhere too,
just not while device is open) - we'll wait for that. Especially
that I'm 80% I complained about the PM in those drivers in
the past and nobody seemed to care. It's a constant source of rtnl
deadlocks.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Intel-wired-lan] [PATCH net v3] net: ethtool: do runtime PM outside RTNL
  2023-12-07 17:40       ` Jakub Kicinski
@ 2023-12-11  4:52         ` Marc MERLIN
  2023-12-15 13:42           ` Heiner Kallweit
  0 siblings, 1 reply; 9+ messages in thread
From: Marc MERLIN @ 2023-12-11  4:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Przemek Kitszel, Johannes Berg, Heiner Kallweit,
	intel-wired-lan@lists.osuosl.org, netdev

On Thu, Dec 07, 2023 at 09:40:21AM -0800, Jakub Kicinski wrote:
> On Thu, 7 Dec 2023 11:16:10 +0100 Przemek Kitszel wrote:
> > I have let know our igc TL, architect, and anybody that could be
> > interested via cc: IWL. And I'm happy that this could be done at
> > relaxed pace thanks to Johannes
> 
> I think you may be expecting us to take Johannes's patch.
> It's still on the table, but to make things clear -
> upstream we prefer to wait for the "real fix", so if we agree
> that fixing igb/igc is a better way (as Heiner pointed out on previous
> version PM functions are called by the stack under rtnl elsewhere too,
> just not while device is open) - we'll wait for that. Especially
> that I'm 80% I complained about the PM in those drivers in
> the past and nobody seemed to care. It's a constant source of rtnl
> deadlocks.

For whatever it's worth, I want to be clear that all stock kernels
are 100% unusable on lenovo P17gen2 because of this deadlock and that
without the temporary patch, my laptop would be usuable.
It was also a risk of data loss due to repeated deadlocks and unclean
shutdowns.

I cannot say what the correct fix is, but I am definitely hoping you
will accept some solution for the next stable kernel.

Thank you
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Intel-wired-lan] [PATCH net v3] net: ethtool: do runtime PM outside RTNL
  2023-12-11  4:52         ` Marc MERLIN
@ 2023-12-15 13:42           ` Heiner Kallweit
  2023-12-15 17:46             ` Marc MERLIN
  0 siblings, 1 reply; 9+ messages in thread
From: Heiner Kallweit @ 2023-12-15 13:42 UTC (permalink / raw)
  To: Marc MERLIN, Jakub Kicinski
  Cc: Przemek Kitszel, Johannes Berg, intel-wired-lan@lists.osuosl.org,
	netdev

On 11.12.2023 05:52, Marc MERLIN wrote:
> On Thu, Dec 07, 2023 at 09:40:21AM -0800, Jakub Kicinski wrote:
>> On Thu, 7 Dec 2023 11:16:10 +0100 Przemek Kitszel wrote:
>>> I have let know our igc TL, architect, and anybody that could be
>>> interested via cc: IWL. And I'm happy that this could be done at
>>> relaxed pace thanks to Johannes
>>
>> I think you may be expecting us to take Johannes's patch.
>> It's still on the table, but to make things clear -
>> upstream we prefer to wait for the "real fix", so if we agree
>> that fixing igb/igc is a better way (as Heiner pointed out on previous
>> version PM functions are called by the stack under rtnl elsewhere too,
>> just not while device is open) - we'll wait for that. Especially
>> that I'm 80% I complained about the PM in those drivers in
>> the past and nobody seemed to care. It's a constant source of rtnl
>> deadlocks.
> 
> For whatever it's worth, I want to be clear that all stock kernels
> are 100% unusable on lenovo P17gen2 because of this deadlock and that
> without the temporary patch, my laptop would be usuable.
> It was also a risk of data loss due to repeated deadlocks and unclean
> shutdowns.
> 
Why don't you simply disable runtime pm for the affected device as a
workaround? This can be done via sysfs.

> I cannot say what the correct fix is, but I am definitely hoping you
> will accept some solution for the next stable kernel.
> 
> Thank you
> Marc

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Intel-wired-lan] [PATCH net v3] net: ethtool: do runtime PM outside RTNL
  2023-12-15 13:42           ` Heiner Kallweit
@ 2023-12-15 17:46             ` Marc MERLIN
  2023-12-24 16:30               ` Marc MERLIN
  0 siblings, 1 reply; 9+ messages in thread
From: Marc MERLIN @ 2023-12-15 17:46 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Jakub Kicinski, Johannes Berg, netdev,
	intel-wired-lan@lists.osuosl.org, Przemek Kitszel

On Fri, Dec 15, 2023 at 02:42:01PM +0100, Heiner Kallweit wrote:
> Why don't you simply disable runtime pm for the affected device as a
> workaround? This can be done via sysfs.

1) because I didn't know what the exact bug was and how to work around it :)
2) without power management, the battery use is not good, but yes not
good is better than laptop crashing :)

That said, if it's only affecting wired ethernet, I can also unload the
igc module until I actually need wired ethernet, which is sometimes but
not often.

Thanks for your suggestion
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Intel-wired-lan] [PATCH net v3] net: ethtool: do runtime PM outside RTNL
  2023-12-15 17:46             ` Marc MERLIN
@ 2023-12-24 16:30               ` Marc MERLIN
  2023-12-24 23:12                 ` Heiner Kallweit
  0 siblings, 1 reply; 9+ messages in thread
From: Marc MERLIN @ 2023-12-24 16:30 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Jakub Kicinski, Johannes Berg, netdev,
	intel-wired-lan@lists.osuosl.org, Przemek Kitszel

On Fri, Dec 15, 2023 at 09:46:34AM -0800, Marc MERLIN wrote:
> On Fri, Dec 15, 2023 at 02:42:01PM +0100, Heiner Kallweit wrote:
> > Why don't you simply disable runtime pm for the affected device as a
> > workaround? This can be done via sysfs.
> 
> 1) because I didn't know what the exact bug was and how to work around it :)

Mmmh, so I need to test an ubuntu kernel (6.5.0-14) because of sound
issues in mainline TOT, and I can't boot the kernel to completion
without hititng this hang bug. I'm not exactly sure which part of the
boot triggers it.

I can't patch that kernel easily. How exactly do I disable runtime PM
from the kernel command line for "that device" which I'm not even sure
which one it is.  If it's the eth device, I already removed the igc
module to prevent it from loading, and I also removed the ethtool
binary, but I'm still getting the hang.

On the plus side, with 6.6.8 and the old patch which I understand is not
the ideal solution, I can confirm that I've been running problem free
until now, so thanks again for that interim patch.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Intel-wired-lan] [PATCH net v3] net: ethtool: do runtime PM outside RTNL
  2023-12-24 16:30               ` Marc MERLIN
@ 2023-12-24 23:12                 ` Heiner Kallweit
  2023-12-25  8:03                   ` Sasha Neftin
  0 siblings, 1 reply; 9+ messages in thread
From: Heiner Kallweit @ 2023-12-24 23:12 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: Jakub Kicinski, Johannes Berg, netdev,
	intel-wired-lan@lists.osuosl.org, Przemek Kitszel

On 24.12.2023 17:30, Marc MERLIN wrote:
> On Fri, Dec 15, 2023 at 09:46:34AM -0800, Marc MERLIN wrote:
>> On Fri, Dec 15, 2023 at 02:42:01PM +0100, Heiner Kallweit wrote:
>>> Why don't you simply disable runtime pm for the affected device as a
>>> workaround? This can be done via sysfs.
>>
>> 1) because I didn't know what the exact bug was and how to work around it :)
> 
> Mmmh, so I need to test an ubuntu kernel (6.5.0-14) because of sound
> issues in mainline TOT, and I can't boot the kernel to completion
> without hititng this hang bug. I'm not exactly sure which part of the
> boot triggers it.
> 
> I can't patch that kernel easily. How exactly do I disable runtime PM
> from the kernel command line for "that device" which I'm not even sure

Change <device>/power/control from "auto" to "on".

> which one it is.  If it's the eth device, I already removed the igc
> module to prevent it from loading, and I also removed the ethtool
> binary, but I'm still getting the hang.
> 
> On the plus side, with 6.6.8 and the old patch which I understand is not
> the ideal solution, I can confirm that I've been running problem free
> until now, so thanks again for that interim patch.
> 
> Thanks,
> Marc

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Intel-wired-lan] [PATCH net v3] net: ethtool: do runtime PM outside RTNL
  2023-12-24 23:12                 ` Heiner Kallweit
@ 2023-12-25  8:03                   ` Sasha Neftin
  2023-12-25 11:21                     ` Marc MERLIN
  0 siblings, 1 reply; 9+ messages in thread
From: Sasha Neftin @ 2023-12-25  8:03 UTC (permalink / raw)
  To: Heiner Kallweit, Marc MERLIN
  Cc: Jakub Kicinski, Johannes Berg, intel-wired-lan@lists.osuosl.org,
	Przemek Kitszel, netdev

On 25/12/2023 1:12, Heiner Kallweit wrote:
> On 24.12.2023 17:30, Marc MERLIN wrote:
>> On Fri, Dec 15, 2023 at 09:46:34AM -0800, Marc MERLIN wrote:
>>> On Fri, Dec 15, 2023 at 02:42:01PM +0100, Heiner Kallweit wrote:
>>>> Why don't you simply disable runtime pm for the affected device as a
>>>> workaround? This can be done via sysfs.
>>>
>>> 1) because I didn't know what the exact bug was and how to work around it :)
>>
>> Mmmh, so I need to test an ubuntu kernel (6.5.0-14) because of sound
>> issues in mainline TOT, and I can't boot the kernel to completion
>> without hititng this hang bug. I'm not exactly sure which part of the
>> boot triggers it.
>>
>> I can't patch that kernel easily. How exactly do I disable runtime PM
>> from the kernel command line for "that device" which I'm not even sure
> 
> Change <device>/power/control from "auto" to "on".

Need to figure out your controller location in a file system via 
lspci/lspci -t and then change to "on"
For example: echo on > 
/sys/devices/pci0000\:00/0000\:00\:1c.0/0000\:ae\:00.0/power/control

We are starting to look at this problem, but I can't reproduce the 
problem on my machines yet.

> 
>> which one it is.  If it's the eth device, I already removed the igc
>> module to prevent it from loading, and I also removed the ethtool
>> binary, but I'm still getting the hang.
>>
>> On the plus side, with 6.6.8 and the old patch which I understand is not
>> the ideal solution, I can confirm that I've been running problem free
>> until now, so thanks again for that interim patch.
>>
>> Thanks,
>> Marc
> 
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Intel-wired-lan] [PATCH net v3] net: ethtool: do runtime PM outside RTNL
  2023-12-25  8:03                   ` Sasha Neftin
@ 2023-12-25 11:21                     ` Marc MERLIN
  0 siblings, 0 replies; 9+ messages in thread
From: Marc MERLIN @ 2023-12-25 11:21 UTC (permalink / raw)
  To: Sasha Neftin
  Cc: netdev, intel-wired-lan@lists.osuosl.org, Przemek Kitszel,
	Jakub Kicinski, Johannes Berg, Heiner Kallweit

On Mon, Dec 25, 2023 at 10:03:23AM +0200, Sasha Neftin wrote:
> > > I can't patch that kernel easily. How exactly do I disable runtime PM
> > > from the kernel command line for "that device" which I'm not even sure
> > 
> > Change <device>/power/control from "auto" to "on".
> 
> Need to figure out your controller location in a file system via lspci/lspci
> -t and then change to "on"
> For example: echo on >
> /sys/devices/pci0000\:00/0000\:00\:1c.0/0000\:ae\:00.0/power/control
> 
> We are starting to look at this problem, but I can't reproduce the problem
> on my machines yet.

Thanks. I realized it was going to be hard either way if the boot hangs
before I get to a command prompt, which was what was happening
yesterday.
I had to boot ubuntu to debug a sound issue, and it was very tricky
since most of the time it hung before I got to a command prompt, but I
was finally able to get it to work long enough to debug the sound issue
and go back to my self built kernel to port over the sound config I
needed.

I wish I could tell you exactly how to reproduct this in a more useful
way, sorry about that.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-12-26 16:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20231206113934.8d7819857574.I2deb5804ef1739a2af307283d320ef7d82456494@changeid>
     [not found] ` <20231206084448.53b48c49@kernel.org>
     [not found]   ` <e6f227ee701e1ee37e8f568b1310d240a2b8935a.camel@sipsolutions.net>
2023-12-07 10:16     ` [Intel-wired-lan] [PATCH net v3] net: ethtool: do runtime PM outside RTNL Przemek Kitszel
2023-12-07 17:40       ` Jakub Kicinski
2023-12-11  4:52         ` Marc MERLIN
2023-12-15 13:42           ` Heiner Kallweit
2023-12-15 17:46             ` Marc MERLIN
2023-12-24 16:30               ` Marc MERLIN
2023-12-24 23:12                 ` Heiner Kallweit
2023-12-25  8:03                   ` Sasha Neftin
2023-12-25 11:21                     ` Marc MERLIN

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox