From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D7F7C2D0E4 for ; Fri, 20 Nov 2020 21:28:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 33C0F2240B for ; Fri, 20 Nov 2020 21:28:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728765AbgKTV2L (ORCPT ); Fri, 20 Nov 2020 16:28:11 -0500 Received: from www2.webmail.pair.com ([66.39.3.96]:50402 "EHLO www2.webmail.pair.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728596AbgKTV2K (ORCPT ); Fri, 20 Nov 2020 16:28:10 -0500 Received: from rc.webmail.pair.com (localhost [127.0.0.1]) by www2.webmail.pair.com (Postfix) with ESMTP id 3283A1C010D; Fri, 20 Nov 2020 16:28:09 -0500 (EST) MIME-Version: 1.0 Date: Fri, 20 Nov 2020 15:28:09 -0600 From: "K.R. Foley" To: Randy Dunlap Cc: Jeff Moyer , linux-fsdevel@vger.kernel.org Subject: Re: BUG triggers running lsof In-Reply-To: References: <4cc7a530-41ed-81f4-82cd-6a3a93661dce@infradead.org> <5310969ec0c67c25ae2eff16f1e904d5@cybsft.com> User-Agent: Roundcube Webmail/1.4.9 Message-ID: <55e19a5d6a64d24280c3eb82ef7cf183@cybsft.com> X-Sender: kr@cybsft.com Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On 2020-11-20 15:13, Randy Dunlap wrote: > On 11/20/20 12:59 PM, K.R. Foley wrote: >> >> >> >> On 2020-11-20 13:51, Jeff Moyer wrote: >>> Randy Dunlap writes: >>> >>>> On 11/20/20 11:16 AM, K.R. Foley wrote: >>>>> I have found an issue that triggers by running lsof. The problem is >>>>> reproducible, but not consistently. I have seen this issue occur on >>>>> multiple versions of the kernel (5.0.10, 5.2.8 and now 5.4.77). It >>>>> looks like it could be a race condition or the file pointer is >>>>> being >>>>> corrupted. Any pointers on how to track this down? What additional >>>>> information can I provide? >>>> >>>> Hi, >>>> >>>> 2 things in general: >>>> >>>> a) Can you test with a more recent kernel? >>>> >>>> b) Can you reproduce this without loading the proprietary & >>>> out-of-tree >>>> kernel modules?  They should never have been loaded after bootup. >>>> I.e., don't just unload them -- that could leave something bad >>>> behind. >>> >>> Heh, the EIP contains part of the name of one of the modules: >>> >>>> >>>>> [ 8057.297159] BUG: unable to handle page fault for address: >>>>> 31376f63 >>>                                                                 >>> ^^^^^^^^ > > Thanks for noticing that, Jeff. I should have seen it. > >>>>> [ 8057.297219] Modules linked in: ITXico7100Module(O) >>>                                          ^^^^ >> >> Perhaps this is a dumb question, but how could this happen? > > > We don't know what is in that loadable kernel module, so we can't > give a definitive answer to your question, other than it's buggy. > Or maybe it was just written for an older kernel version. > Or a kernel with different build options/settings. I am starting to look at this now. It was written for an older kernel by someone else. Thank you for the tips. > > Have you contacted IT support? > > It would (will) be interesting to see if you can reproduce the problem > without these modules being loaded... > I kind of doubt it, but if it does still fail, it will give us > something > to look at. Knowing a little more now. I doubt it will be reproducible without the module. -- Regards, K.R. Foley