From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932619AbVHTB1I (ORCPT ); Fri, 19 Aug 2005 21:27:08 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932617AbVHTB1H (ORCPT ); Fri, 19 Aug 2005 21:27:07 -0400 Received: from tornado.reub.net ([202.89.145.182]:52913 "EHLO tornado.reub.net") by vger.kernel.org with ESMTP id S932619AbVHTB1H (ORCPT ); Fri, 19 Aug 2005 21:27:07 -0400 Message-ID: <430686EA.3000901@reub.net> Date: Sat, 20 Aug 2005 13:27:06 +1200 From: Reuben Farrelly User-Agent: Thunderbird 1.6a1 (Windows/20050819) MIME-Version: 1.0 To: Andrew Morton CC: linux-kernel@vger.kernel.org Subject: Re: 2.6.13-rc6-mm1 References: <20050819043331.7bc1f9a9.akpm@osdl.org> <4305DCC6.70906@reub.net> <20050819103435.2c88a9f2.akpm@osdl.org> In-Reply-To: <20050819103435.2c88a9f2.akpm@osdl.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hi again, On 20/08/2005 5:34 a.m., Andrew Morton wrote: > Reuben Farrelly wrote: >> A few new problems cropped up with this kernel.. >> >> 1. NFS seems to be unstable, oopsing when shutting down: > > --- devel/fs/nfsd/nfssvc.c~ingo-nfs-stuff-fix 2005-08-19 10:29:15.000000000 -0700 > +++ devel-akpm/fs/nfsd/nfssvc.c 2005-08-19 10:30:03.000000000 -0700 > @@ -286,7 +286,6 @@ out: > /* Release the thread */ > svc_exit_thread(rqstp); > > - unlock_kernel(); > /* Release module */ > unlock_kernel(); > module_put_and_exit(0); > _ That fixed it, thanks. >> Aug 20 12:26:10 tornado kernel: Device not ready. >> >> 2. That message on the third line of the trace above: "kernel: Device not >> ready." is being logged every few mins or so, I believe it is my SCSI CDROM >> that is causing it. It also logs something similar after the SCSI driver has >> probed the device on boot: >> >> Aug 20 12:24:36 tornado kernel: scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA >> DRIVER, Rev 7.0 >> Aug 20 12:24:36 tornado kernel: >> Aug 20 12:24:36 tornado kernel: aic7880: Ultra Wide Channel A, SCSI >> Id=7, 16/253 SCBs >> Aug 20 12:24:36 tornado kernel: >> Aug 20 12:24:36 tornado kernel: Vendor: SONY Model: CD-RW CRX145S >> Rev: 1.0b >> Aug 20 12:24:36 tornado kernel: Type: CD-ROM >> ANSI SCSI revision: 04 >> Aug 20 12:24:36 tornado kernel: target0:0:6: Beginning Domain Validation >> Aug 20 12:24:36 tornado kernel: target0:0:6: Domain Validation skipping write >> tests >> Aug 20 12:24:36 tornado kernel: target0:0:6: FAST-10 SCSI 10.0 MB/s ST (100 >> ns, offset 15) >> Aug 20 12:24:36 tornado kernel: target0:0:6: Ending Domain Validation >> Aug 20 12:24:36 tornado kernel: Device not ready. >> >> This has been a problem for quite a few weeks now, albeit I believe, only a >> cosmetic one. > > Is some application trying to poll the device? I wonder if hald knows something about this and is polling.. however that message above about "Device not ready" occurs when the kernel is booting, before any userspace stuff has started up. Maybe hald is just being a bit aggressive in re-probing the drive after userspace launches. B all accounts after a week of uptime the drive certainly ought to be ready, it seems to work ok ;-) Note the extra space after 'Device' and 'not' which implies possibly some text is missing (which would have made it more clear which device is not exactly ready). The case sensitive strings "Device" and "not ready" appears together in scsi_lib.c and very few other places. > Is the device actually "not ready", or is it in reality ready and working? > ie: what happens if you stick a CD in it? The CD can be read, and the error messages go away. They stay away even after the CD has been ejected. >> 4. PAM is complaining about "PAM audit_open() failed: Protocol not suppor >> ted" and I can't log in as any user including root. I would have picked this >> was a userspace problem, but it doesn't break with -rc5-mm1, yet reproduceably >> breaks with -rc6-mm1. Weird. > > hm. How come you're able to use the machine then? Machine was booting up ok, and things were being written to syslog. Rebooted into -rc5-mm1 to investigate, and of course could boot into rc6-mm1 in single user mode, test and bring services up one by one from there. Having two boxes helped too. > Is it possible to get an strace of this failure somehow? Not sure if this is needed anymore, as I found that the problem goes away when I compile in kernel auditing. This not required for -rc5-mm1. Is that change intended? reuben