From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Moyer Subject: Re: Deadlocks due to per-process plugging Date: Wed, 11 Jul 2012 12:05:51 -0400 Message-ID: References: <20120711133735.GA8122@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: LKML , linux-fsdevel@vger.kernel.org, Tejun Heo , Jens Axboe To: Jan Kara Return-path: In-Reply-To: <20120711133735.GA8122@quack.suse.cz> (Jan Kara's message of "Wed, 11 Jul 2012 15:37:35 +0200") Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Jan Kara writes: > Hello, > > we've recently hit a deadlock in our QA runs which is caused by the > per-process plugging code. The problem is as follows: > process A process B (kjournald) > generic_file_aio_write() > blk_start_plug(&plug); > ... > somewhere in here we allocate memory and > direct reclaim submits buffer X for IO > ... > ext3_write_begin() > ext3_journal_start() > we need more space in a journal > so we want to checkpoint old transactions, > we block waiting for kjournald to commit > a currently running transaction. > journal_commit_transaction() > wait for IO on buffer X > to complete as it is part > of the current transaction > > => deadlock since A waits for B and B waits for A to do unplug. > BTW: I don't think this is really ext3/ext4 specific. I think other > filesystems can get into problems as well when direct reclaim submits some > IO and the process subsequently blocks without submitting the IO. So, I thought schedule would do the flush. Checking the code: asmlinkage void __sched schedule(void) { struct task_struct *tsk = current; sched_submit_work(tsk); __schedule(); } And sched_submit_work looks like this: static inline void sched_submit_work(struct task_struct *tsk) { if (!tsk->state || tsk_is_pi_blocked(tsk)) return; /* * If we are going to sleep and we have plugged IO queued, * make sure to submit it to avoid deadlocks. */ if (blk_needs_flush_plug(tsk)) blk_schedule_flush_plug(tsk); } This eventually ends in a call to blk_run_queue_async(q) after submitting the I/O from the plug list. Right? So is the question really why doesn't the kblockd workqueue get scheduled? Cheers, Jeff