From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: 2.6.34 PDC20268 PATA IO error loop makes system unusable Date: Mon, 14 Jun 2010 09:59:39 +0200 Message-ID: <4C15E16B.5000702@kernel.org> References: <20100613154808.GA10408@basil.fritz.box> <4C15DDA0.3020409@kernel.org> <20100614075350.GC17092@basil.fritz.box> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: Received: from hera.kernel.org ([140.211.167.34]:58540 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753438Ab0FNH7o (ORCPT ); Mon, 14 Jun 2010 03:59:44 -0400 In-Reply-To: <20100614075350.GC17092@basil.fritz.box> Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: Andi Kleen Cc: linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org Hello, On 06/14/2010 09:53 AM, Andi Kleen wrote: > On Mon, Jun 14, 2010 at 09:43:28AM +0200, Tejun Heo wrote: >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00 >>> sd 11:0:0:0: [sdd] Unhandled error code >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00 >>> sd 11:0:0:0: [sdd] Unhandled error code >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00 >>> >>> same messages repeating forever, just with CDB changing occasionally. >>> >>> .... >>> >>> not stopping until I reset the box. >> >> Did you have a lot of dirty pages? It looks like upper layer is > > Yes, there was a dd running. > >> trying to flush all the dirty buffers and SCSI is a tad bit too >> verbose about failing each IO w/ DID_BAD_TARGET thus taking a very > > A bit too verbose? That's really an euphemism ... Yeap, of course it was. :-) > During the CDB: Write loop the console was totally unusable! > > And I think the fsyncs in syslogd completely made the performance > tank. Console often becomes the bottleneck too when there are a lot of kernel messages. > So basically it was a "reset button only" situation. > > When the device is gone what's the point in giving a message > more than once? Can't the requests just be silently failed in this > case? Yeah, it would be better to somehow summarize those error message instead of spitting out all of them. Thanks. -- tejun