* BUG triggers running lsof
@ 2020-11-20 19:16 K.R. Foley
2020-11-20 19:42 ` Randy Dunlap
0 siblings, 1 reply; 7+ messages in thread
From: K.R. Foley @ 2020-11-20 19:16 UTC (permalink / raw)
To: linux-fsdevel
I have found an issue that triggers by running lsof. The problem is
reproducible, but not consistently. I have seen this issue occur on
multiple versions of the kernel (5.0.10, 5.2.8 and now 5.4.77). It looks
like it could be a race condition or the file pointer is being
corrupted. Any pointers on how to track this down? What additional
information can I provide?
[ 8057.297159] BUG: unable to handle page fault for address: 31376f63
[ 8057.297163] #PF: supervisor read access in kernel mode
[ 8057.297164] #PF: error_code(0x0000) - not-present page
[ 8057.297166] *pde = 00000000
[ 8057.297168] Oops: 0000 [#1] SMP
[ 8057.297171] CPU: 1 PID: 461 Comm: lsof Tainted: P O
5.4.77-PRD.1.5 #3
[ 8057.297172] Hardware name: Incredible Technologies Inc.
Nighthawk/IMBM-B75A-A20-IT01, BIOS 0404 03/14/2014
[ 8057.297175] EIP: 0x31376f63
[ 8057.297176] Code: Bad RIP value.
[ 8057.297177] EAX: f55962d0 EBX: f55962d0 ECX: 31376f63 EDX: f69ddd80
[ 8057.297179] ESI: f69ddd80 EDI: f6899b00 EBP: c2621e88 ESP: c2621e5c
[ 8057.297180] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS:
00010206
[ 8057.297182] CR0: 80050033 CR2: 31376f59 CR3: 046e1000 CR4: 000406d0
[ 8057.297183] Call Trace:
[ 8057.297189] ? seq_show+0xfe/0x138
[ 8057.297191] seq_read+0x144/0x3da
[ 8057.297193] ? seq_lseek+0x171/0x171
[ 8057.297196] __vfs_read+0x2d/0x1ba
[ 8057.297198] ? __do_sys_fstat64+0x49/0x50
[ 8057.297200] vfs_read+0x7a/0xfc
[ 8057.297203] ksys_read+0x4c/0xb0
[ 8057.297203] ksys_read+0x4c/0xb0
[ 8057.297205] sys_read+0x11/0x13
[ 8057.297207] do_fast_syscall_32+0x8f/0x1de
[ 8057.297210] entry_SYSENTER_32+0xa2/0xf5
[ 8057.297211] EIP: 0xb7f578e5
[ 8057.297213] Code: d9 89 da 89 f3 e8 17 00 00 00 89 d3 eb dd b8 40 42
0f 00 eb c7 8b 04 24 c3 8b 1c 24 c3 8b 34 24 c3 51 52 55 89 e5 0f 34 cd
80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
[ 8057.297215] EAX: ffffffda EBX: 00000007 ECX: 09e54490 EDX: 00000400
[ 8057.297216] ESI: 09e36a90 EDI: b7f43000 EBP: bf9fde18 ESP: bf9fddb0
[ 8057.297217] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS:
00000246
[ 8057.297219] Modules linked in: ITXico7100Module(O) ITDongle1Module(O)
ITIOBoard2BootLoaderModule(O) ITIOBoard1Module(O) ITBiosWormModule(O)
it87 hwmon_vid ipv6 cfg80211 evdev snd_hda_codec_realtek
snd_hda_codec_generic snd_hda_codec_hdmi fuse ledtrig_audio
snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm_oss
nvidia_drm(PO) snd_pcm nvidia_modeset(PO) nvidia(PO) snd_mixer_oss
ti_usb_3410_5052 snd_timer iTCO_wdt realtek usbserial
iTCO_vendor_support snd sg r8169 serio_raw lpc_ich x86_pkg_temp_thermal
i2c_i801 coretemp libphy mii xhci_pci xhci_hcd ehci_pci ext4 jbd2 ext2
mbcache uhci_hcd ehci_hcd sd_mod ata_piix [last unloaded:
ITXico7100Module]
[ 8057.297241] CR2: 0000000031376f63
[ 8057.297244] ---[ end trace 455c8cdc1bacfeda ]---
[ 8057.297245] EIP: 0x31376f63
[ 8057.297246] Code: Bad RIP value.
[ 8057.297247] EAX: f55962d0 EBX: f55962d0 ECX: 31376f63 EDX: f69ddd80
[ 8057.297248] ESI: f69ddd80 EDI: f6899b00 EBP: c2621e88 ESP: c2621e5c
[ 8057.297250] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS:
00010206
[ 8057.297251] CR0: 80050033 CR2: 31376f59 CR3: 046e1000 CR4: 000406d0
--
Regards,
K.R. Foley
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG triggers running lsof
2020-11-20 19:16 BUG triggers running lsof K.R. Foley
@ 2020-11-20 19:42 ` Randy Dunlap
2020-11-20 19:51 ` Jeff Moyer
2020-11-20 20:56 ` K.R. Foley
0 siblings, 2 replies; 7+ messages in thread
From: Randy Dunlap @ 2020-11-20 19:42 UTC (permalink / raw)
To: K.R. Foley, linux-fsdevel
On 11/20/20 11:16 AM, K.R. Foley wrote:
> I have found an issue that triggers by running lsof. The problem is reproducible, but not consistently. I have seen this issue occur on multiple versions of the kernel (5.0.10, 5.2.8 and now 5.4.77). It looks like it could be a race condition or the file pointer is being corrupted. Any pointers on how to track this down? What additional information can I provide?
Hi,
2 things in general:
a) Can you test with a more recent kernel?
b) Can you reproduce this without loading the proprietary & out-of-tree
kernel modules? They should never have been loaded after bootup.
I.e., don't just unload them -- that could leave something bad behind.
> [ 8057.297159] BUG: unable to handle page fault for address: 31376f63
> [ 8057.297163] #PF: supervisor read access in kernel mode
> [ 8057.297164] #PF: error_code(0x0000) - not-present page
> [ 8057.297166] *pde = 00000000
> [ 8057.297168] Oops: 0000 [#1] SMP
> [ 8057.297171] CPU: 1 PID: 461 Comm: lsof Tainted: P O 5.4.77-PRD.1.5 #3
> [ 8057.297172] Hardware name: Incredible Technologies Inc. Nighthawk/IMBM-B75A-A20-IT01, BIOS 0404 03/14/2014
> [ 8057.297175] EIP: 0x31376f63
> [ 8057.297176] Code: Bad RIP value.
> [ 8057.297177] EAX: f55962d0 EBX: f55962d0 ECX: 31376f63 EDX: f69ddd80
> [ 8057.297179] ESI: f69ddd80 EDI: f6899b00 EBP: c2621e88 ESP: c2621e5c
> [ 8057.297180] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010206
> [ 8057.297182] CR0: 80050033 CR2: 31376f59 CR3: 046e1000 CR4: 000406d0
> [ 8057.297183] Call Trace:
> [ 8057.297189] ? seq_show+0xfe/0x138
> [ 8057.297191] seq_read+0x144/0x3da
> [ 8057.297193] ? seq_lseek+0x171/0x171
> [ 8057.297196] __vfs_read+0x2d/0x1ba
> [ 8057.297198] ? __do_sys_fstat64+0x49/0x50
> [ 8057.297200] vfs_read+0x7a/0xfc
> [ 8057.297203] ksys_read+0x4c/0xb0
> [ 8057.297203] ksys_read+0x4c/0xb0
> [ 8057.297205] sys_read+0x11/0x13
> [ 8057.297207] do_fast_syscall_32+0x8f/0x1de
> [ 8057.297210] entry_SYSENTER_32+0xa2/0xf5
> [ 8057.297211] EIP: 0xb7f578e5
> [ 8057.297213] Code: d9 89 da 89 f3 e8 17 00 00 00 89 d3 eb dd b8 40 42 0f 00 eb c7 8b 04 24 c3 8b 1c 24 c3 8b 34 24 c3 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
> [ 8057.297215] EAX: ffffffda EBX: 00000007 ECX: 09e54490 EDX: 00000400
> [ 8057.297216] ESI: 09e36a90 EDI: b7f43000 EBP: bf9fde18 ESP: bf9fddb0
> [ 8057.297217] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246
> [ 8057.297219] Modules linked in: ITXico7100Module(O) ITDongle1Module(O) ITIOBoard2BootLoaderModule(O) ITIOBoard1Module(O) ITBiosWormModule(O) it87 hwmon_vid ipv6 cfg80211 evdev snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi fuse ledtrig_audio snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm_oss nvidia_drm(PO) snd_pcm nvidia_modeset(PO) nvidia(PO) snd_mixer_oss ti_usb_3410_5052 snd_timer iTCO_wdt realtek usbserial iTCO_vendor_support snd sg r8169 serio_raw lpc_ich x86_pkg_temp_thermal i2c_i801 coretemp libphy mii xhci_pci xhci_hcd ehci_pci ext4 jbd2 ext2 mbcache uhci_hcd ehci_hcd sd_mod ata_piix [last unloaded: ITXico7100Module]
> [ 8057.297241] CR2: 0000000031376f63
> [ 8057.297244] ---[ end trace 455c8cdc1bacfeda ]---
> [ 8057.297245] EIP: 0x31376f63
> [ 8057.297246] Code: Bad RIP value.
> [ 8057.297247] EAX: f55962d0 EBX: f55962d0 ECX: 31376f63 EDX: f69ddd80
> [ 8057.297248] ESI: f69ddd80 EDI: f6899b00 EBP: c2621e88 ESP: c2621e5c
> [ 8057.297250] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010206
> [ 8057.297251] CR0: 80050033 CR2: 31376f59 CR3: 046e1000 CR4: 000406d0
>
>
--
~Randy
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG triggers running lsof
2020-11-20 19:42 ` Randy Dunlap
@ 2020-11-20 19:51 ` Jeff Moyer
2020-11-20 20:59 ` K.R. Foley
2020-11-20 20:56 ` K.R. Foley
1 sibling, 1 reply; 7+ messages in thread
From: Jeff Moyer @ 2020-11-20 19:51 UTC (permalink / raw)
To: Randy Dunlap; +Cc: K.R. Foley, linux-fsdevel
Randy Dunlap <rdunlap@infradead.org> writes:
> On 11/20/20 11:16 AM, K.R. Foley wrote:
>> I have found an issue that triggers by running lsof. The problem is
>> reproducible, but not consistently. I have seen this issue occur on
>> multiple versions of the kernel (5.0.10, 5.2.8 and now 5.4.77). It
>> looks like it could be a race condition or the file pointer is being
>> corrupted. Any pointers on how to track this down? What additional
>> information can I provide?
>
> Hi,
>
> 2 things in general:
>
> a) Can you test with a more recent kernel?
>
> b) Can you reproduce this without loading the proprietary & out-of-tree
> kernel modules? They should never have been loaded after bootup.
> I.e., don't just unload them -- that could leave something bad behind.
Heh, the EIP contains part of the name of one of the modules:
>
>> [ 8057.297159] BUG: unable to handle page fault for address: 31376f63
^^^^^^^^
>> [ 8057.297219] Modules linked in: ITXico7100Module(O)
^^^^
-Jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG triggers running lsof
2020-11-20 19:42 ` Randy Dunlap
2020-11-20 19:51 ` Jeff Moyer
@ 2020-11-20 20:56 ` K.R. Foley
1 sibling, 0 replies; 7+ messages in thread
From: K.R. Foley @ 2020-11-20 20:56 UTC (permalink / raw)
To: Randy Dunlap; +Cc: linux-fsdevel
---
Regards,
K.R. Foley
On 2020-11-20 13:42, Randy Dunlap wrote:
> On 11/20/20 11:16 AM, K.R. Foley wrote:
>> I have found an issue that triggers by running lsof. The problem is
>> reproducible, but not consistently. I have seen this issue occur on
>> multiple versions of the kernel (5.0.10, 5.2.8 and now 5.4.77). It
>> looks like it could be a race condition or the file pointer is being
>> corrupted. Any pointers on how to track this down? What additional
>> information can I provide?
>
> Hi,
>
> 2 things in general:
>
> a) Can you test with a more recent kernel?
>
> b) Can you reproduce this without loading the proprietary & out-of-tree
> kernel modules? They should never have been loaded after bootup.
> I.e., don't just unload them -- that could leave something bad behind.
I can try to reproduce with a newer kernel and without the modules.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG triggers running lsof
2020-11-20 19:51 ` Jeff Moyer
@ 2020-11-20 20:59 ` K.R. Foley
2020-11-20 21:13 ` Randy Dunlap
0 siblings, 1 reply; 7+ messages in thread
From: K.R. Foley @ 2020-11-20 20:59 UTC (permalink / raw)
To: Jeff Moyer; +Cc: Randy Dunlap, linux-fsdevel
On 2020-11-20 13:51, Jeff Moyer wrote:
> Randy Dunlap <rdunlap@infradead.org> writes:
>
>> On 11/20/20 11:16 AM, K.R. Foley wrote:
>>> I have found an issue that triggers by running lsof. The problem is
>>> reproducible, but not consistently. I have seen this issue occur on
>>> multiple versions of the kernel (5.0.10, 5.2.8 and now 5.4.77). It
>>> looks like it could be a race condition or the file pointer is being
>>> corrupted. Any pointers on how to track this down? What additional
>>> information can I provide?
>>
>> Hi,
>>
>> 2 things in general:
>>
>> a) Can you test with a more recent kernel?
>>
>> b) Can you reproduce this without loading the proprietary &
>> out-of-tree
>> kernel modules? They should never have been loaded after bootup.
>> I.e., don't just unload them -- that could leave something bad behind.
>
> Heh, the EIP contains part of the name of one of the modules:
>
>>
>>> [ 8057.297159] BUG: unable to handle page fault for address: 31376f63
>
> ^^^^^^^^
>
>>> [ 8057.297219] Modules linked in: ITXico7100Module(O)
> ^^^^
Perhaps this is a dumb question, but how could this happen?
> -Jeff
kr
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG triggers running lsof
2020-11-20 20:59 ` K.R. Foley
@ 2020-11-20 21:13 ` Randy Dunlap
2020-11-20 21:28 ` K.R. Foley
0 siblings, 1 reply; 7+ messages in thread
From: Randy Dunlap @ 2020-11-20 21:13 UTC (permalink / raw)
To: K.R. Foley, Jeff Moyer; +Cc: linux-fsdevel
On 11/20/20 12:59 PM, K.R. Foley wrote:
>
>
>
> On 2020-11-20 13:51, Jeff Moyer wrote:
>> Randy Dunlap <rdunlap@infradead.org> writes:
>>
>>> On 11/20/20 11:16 AM, K.R. Foley wrote:
>>>> I have found an issue that triggers by running lsof. The problem is
>>>> reproducible, but not consistently. I have seen this issue occur on
>>>> multiple versions of the kernel (5.0.10, 5.2.8 and now 5.4.77). It
>>>> looks like it could be a race condition or the file pointer is being
>>>> corrupted. Any pointers on how to track this down? What additional
>>>> information can I provide?
>>>
>>> Hi,
>>>
>>> 2 things in general:
>>>
>>> a) Can you test with a more recent kernel?
>>>
>>> b) Can you reproduce this without loading the proprietary & out-of-tree
>>> kernel modules? They should never have been loaded after bootup.
>>> I.e., don't just unload them -- that could leave something bad behind.
>>
>> Heh, the EIP contains part of the name of one of the modules:
>>
>>>
>>>> [ 8057.297159] BUG: unable to handle page fault for address: 31376f63
>> ^^^^^^^^
Thanks for noticing that, Jeff. I should have seen it.
>>>> [ 8057.297219] Modules linked in: ITXico7100Module(O)
>> ^^^^
>
> Perhaps this is a dumb question, but how could this happen?
We don't know what is in that loadable kernel module, so we can't
give a definitive answer to your question, other than it's buggy.
Or maybe it was just written for an older kernel version.
Or a kernel with different build options/settings.
Have you contacted IT support?
It would (will) be interesting to see if you can reproduce the problem
without these modules being loaded...
I kind of doubt it, but if it does still fail, it will give us something
to look at.
--
~Randy
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG triggers running lsof
2020-11-20 21:13 ` Randy Dunlap
@ 2020-11-20 21:28 ` K.R. Foley
0 siblings, 0 replies; 7+ messages in thread
From: K.R. Foley @ 2020-11-20 21:28 UTC (permalink / raw)
To: Randy Dunlap; +Cc: Jeff Moyer, linux-fsdevel
On 2020-11-20 15:13, Randy Dunlap wrote:
> On 11/20/20 12:59 PM, K.R. Foley wrote:
>>
>>
>>
>> On 2020-11-20 13:51, Jeff Moyer wrote:
>>> Randy Dunlap <rdunlap@infradead.org> writes:
>>>
>>>> On 11/20/20 11:16 AM, K.R. Foley wrote:
>>>>> I have found an issue that triggers by running lsof. The problem is
>>>>> reproducible, but not consistently. I have seen this issue occur on
>>>>> multiple versions of the kernel (5.0.10, 5.2.8 and now 5.4.77). It
>>>>> looks like it could be a race condition or the file pointer is
>>>>> being
>>>>> corrupted. Any pointers on how to track this down? What additional
>>>>> information can I provide?
>>>>
>>>> Hi,
>>>>
>>>> 2 things in general:
>>>>
>>>> a) Can you test with a more recent kernel?
>>>>
>>>> b) Can you reproduce this without loading the proprietary &
>>>> out-of-tree
>>>> kernel modules? They should never have been loaded after bootup.
>>>> I.e., don't just unload them -- that could leave something bad
>>>> behind.
>>>
>>> Heh, the EIP contains part of the name of one of the modules:
>>>
>>>>
>>>>> [ 8057.297159] BUG: unable to handle page fault for address:
>>>>> 31376f63
>>>
>>> ^^^^^^^^
>
> Thanks for noticing that, Jeff. I should have seen it.
>
>>>>> [ 8057.297219] Modules linked in: ITXico7100Module(O)
>>> ^^^^
>>
>> Perhaps this is a dumb question, but how could this happen?
>
>
> We don't know what is in that loadable kernel module, so we can't
> give a definitive answer to your question, other than it's buggy.
> Or maybe it was just written for an older kernel version.
> Or a kernel with different build options/settings.
I am starting to look at this now. It was written for an older kernel by
someone else. Thank you for the tips.
>
> Have you contacted IT support?
>
> It would (will) be interesting to see if you can reproduce the problem
> without these modules being loaded...
> I kind of doubt it, but if it does still fail, it will give us
> something
> to look at.
Knowing a little more now. I doubt it will be reproducible without the
module.
--
Regards,
K.R. Foley
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-11-20 21:28 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-20 19:16 BUG triggers running lsof K.R. Foley
2020-11-20 19:42 ` Randy Dunlap
2020-11-20 19:51 ` Jeff Moyer
2020-11-20 20:59 ` K.R. Foley
2020-11-20 21:13 ` Randy Dunlap
2020-11-20 21:28 ` K.R. Foley
2020-11-20 20:56 ` K.R. Foley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).