From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-fx0-f49.google.com ([209.85.161.49]) by bombadil.infradead.org with esmtp (Exim 4.72 #1 (Red Hat Linux)) id 1P1BEY-0008IS-Or for linux-mtd@lists.infradead.org; Thu, 30 Sep 2010 04:59:08 +0000 Received: by fxm15 with SMTP id 15so1324117fxm.36 for ; Wed, 29 Sep 2010 21:59:05 -0700 (PDT) Subject: Re: Scheduler latency problems when using NAND From: Artem Bityutskiy To: Mark Mason In-Reply-To: <20100929221401.GA32583@postdiluvian.org> References: <20100929221401.GA32583@postdiluvian.org> Content-Type: text/plain; charset="UTF-8" Date: Thu, 30 Sep 2010 07:56:58 +0300 Message-ID: <1285822618.11684.9.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Cc: linux-mtd@lists.infradead.org, linux-kernel Reply-To: dedekind1@gmail.com List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Wed, 2010-09-29 at 18:14 -0400, Mark Mason wrote: > Hi all, > > I hope this is the right place for this question. I'm having some > problems with scheduler latency when using UBIFS, and I'm hoping for > some suggestions. Hi Mark, this e-mail is not specific to UBIFS, so I suggest you keep lkml to CC. I cannot really suggest you much. Off the top of my head - try to enable preemption in your kernel. But in general, it sounds like you actually need the RT tree. Also there is the ftrace latency tracer - try to use it. > Linux 2.6.29-6, with a newer MTD, dating from probably around six > months ago. Embedded PowerPC 8315, with built-in NAND controller, > using nand/fsl_elbc_nand.c. NAND is a Samsung K9WAG08U1B two-die > stack (one package with two chip selects), 2Gbyte x 8 bit. The system > has plenty of memory, but is short on CPU. > > The application is storing streaming video, almost entirely large > sequential files, roughly 250K to 15M, to a 1.6G filesystem. There's > no seeking or rewriting, just creat, write, close, repeat. No > compression is used on the filesystem. > > The problem I'm seeing is excessively large scheduler latency when > data is flushed to NAND. > > Originally this had been happening during erases. I noticed that > hundreds of erases (up to around 700) were being issued in rapid > succession, and I was seeing other threads unable to run for sometimes > as much as the expected 7 seconds (I measured 1.1 ms per erase). To > address this, I split the erase command in two halves - FIR_OP_CM0 | > FIR_OP_PA | FIR_OP_CM2 and FIR_OP_CW1 | FIR_OP_RS - with schedule() > called in between. This had the effect if issuing the erase, calling > schedule(), then waiting for the erase to complete if it hadn't > already, but usually it had. > > I'm surprised this helped so much, since the calling thread should > have been put to sleep for the duration of the erase by the call to > wait_event_timeout(), but it definitely did - I guess it was the > explicit schedule(). > > The erases are no longer a significant bottleneck, but now the writes > are. A page program takes 200us, which seems too short for an > explicit schedule(), and I am seeing periods with the busy line > asserted in back-to-back 200us chunks for most of a second. > > I have played with thread priorities a bit, but I wound up with too > many threads being "most important". There is some hardware that > can't tolerate large latencies, and unfortunately the existing code > base doesn't have enough separation between critical and non-critical > tasks to allow us to run just the critical stuff at a higher priority. > > On average, the system can keep up with the load, but it has problems > with the burstiness of the flushes to NAND, so I'm hoping for some > ideas to smooth the traffic out, or even a totally different way to > approach the problem. I tried lowering the priority of the UBI > background thread, the failure mode there is pretty obvious. I tried > lowering dirty_background_centisecs, that helped a little bit, but not > enough, and there's also a SATA drive, although a smaller commit > interval probably wouldn't bother it since the traffic is similar. > > I'm contemplating something along the lines of a smaller commit > interval, an even higher background thread priority, and a sleep with > a schedule during the page program, but that many extra context > switches are liable to be a problem - there's no L2 cache on this CPU, > so context switches are extra expensive. > > Does anyone have any suggestions, ideas, hints, advice, etc? > > Thanks! -- Best Regards, Artem Bityutskiy (Артём Битюцкий)