From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from postdiluvian.org ([128.30.54.21] helo=porklips.postdiluvian.org) by bombadil.infradead.org with esmtps (Exim 4.72 #1 (Red Hat Linux)) id 1P14uY-0000bO-Nh for linux-mtd@lists.infradead.org; Wed, 29 Sep 2010 22:14:03 +0000 Received: from mason by porklips.postdiluvian.org with local (Exim 4.72) (envelope-from ) id 1P14uX-0008W5-4Y for linux-mtd@lists.infradead.org; Wed, 29 Sep 2010 18:14:01 -0400 Date: Wed, 29 Sep 2010 18:14:01 -0400 From: Mark Mason To: linux-mtd@lists.infradead.org Subject: Scheduler latency problems when using NAND Message-ID: <20100929221401.GA32583@postdiluvian.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Hi all, I hope this is the right place for this question. I'm having some problems with scheduler latency when using UBIFS, and I'm hoping for some suggestions. Linux 2.6.29-6, with a newer MTD, dating from probably around six months ago. Embedded PowerPC 8315, with built-in NAND controller, using nand/fsl_elbc_nand.c. NAND is a Samsung K9WAG08U1B two-die stack (one package with two chip selects), 2Gbyte x 8 bit. The system has plenty of memory, but is short on CPU. The application is storing streaming video, almost entirely large sequential files, roughly 250K to 15M, to a 1.6G filesystem. There's no seeking or rewriting, just creat, write, close, repeat. No compression is used on the filesystem. The problem I'm seeing is excessively large scheduler latency when data is flushed to NAND. Originally this had been happening during erases. I noticed that hundreds of erases (up to around 700) were being issued in rapid succession, and I was seeing other threads unable to run for sometimes as much as the expected 7 seconds (I measured 1.1 ms per erase). To address this, I split the erase command in two halves - FIR_OP_CM0 | FIR_OP_PA | FIR_OP_CM2 and FIR_OP_CW1 | FIR_OP_RS - with schedule() called in between. This had the effect if issuing the erase, calling schedule(), then waiting for the erase to complete if it hadn't already, but usually it had. I'm surprised this helped so much, since the calling thread should have been put to sleep for the duration of the erase by the call to wait_event_timeout(), but it definitely did - I guess it was the explicit schedule(). The erases are no longer a significant bottleneck, but now the writes are. A page program takes 200us, which seems too short for an explicit schedule(), and I am seeing periods with the busy line asserted in back-to-back 200us chunks for most of a second. I have played with thread priorities a bit, but I wound up with too many threads being "most important". There is some hardware that can't tolerate large latencies, and unfortunately the existing code base doesn't have enough separation between critical and non-critical tasks to allow us to run just the critical stuff at a higher priority. On average, the system can keep up with the load, but it has problems with the burstiness of the flushes to NAND, so I'm hoping for some ideas to smooth the traffic out, or even a totally different way to approach the problem. I tried lowering the priority of the UBI background thread, the failure mode there is pretty obvious. I tried lowering dirty_background_centisecs, that helped a little bit, but not enough, and there's also a SATA drive, although a smaller commit interval probably wouldn't bother it since the traffic is similar. I'm contemplating something along the lines of a smaller commit interval, an even higher background thread priority, and a sleep with a schedule during the page program, but that many extra context switches are liable to be a problem - there's no L2 cache on this CPU, so context switches are extra expensive. Does anyone have any suggestions, ideas, hints, advice, etc? Thanks!