* Re: "BUG: scheduling while atomic: pdflush/30/0x00000002" in latest git [not found] <a0adeea50901080007q5a3ed5c8y5a744ce37e677325@mail.gmail.com> @ 2009-01-08 14:37 ` Dave Kleikamp 2009-01-08 15:21 ` Arjan van de Ven 2009-01-09 5:18 ` "BUG: scheduling while atomic: pdflush/30/0x00000002" in latest git Grissiom 0 siblings, 2 replies; 17+ messages in thread From: Dave Kleikamp @ 2009-01-08 14:37 UTC (permalink / raw) To: Grissiom; +Cc: linux-kernel, Arjan van de Ven, linux-fsdevel Adding cc:lix-fsdevel On Thu, 2009-01-08 at 16:07 +0800, Grissiom wrote: > When I using the latest git version, I occasionally got this in dmesg: > > [ 2008.237234] BUG: scheduling while atomic: pdflush/30/0x00000002 > [ 2008.237240] 2 locks held by pdflush/30: > [ 2008.237244] #0: (mutex){--..}, at: [<c01a57a1>] sync_filesystems+0x11/0x120 > [ 2008.237258] #1: (sb_lock){--..}, at: [<c01a57ab>] > sync_filesystems+0x1b/0x120 > [ 2008.237277] Modules linked in: fuse ricoh_mmc b43 > [ 2008.237288] Pid: 30, comm: pdflush Not tainted > 2.6.28-g14-rfkill-nophy-ledon-07485-g9e42d0c #62 > [ 2008.237294] Call Trace: > [ 2008.237303] [<c04d7576>] schedule+0x326/0x8e0 > [ 2008.237311] [<c01500d3>] __lock_acquire+0x293/0xa20 > [ 2008.237321] [<c014e307>] mark_held_locks+0x67/0x80 > [ 2008.237330] [<c04da58c>] _spin_unlock_irqrestore+0x4c/0x60 > [ 2008.237339] [<c014e519>] trace_hardirqs_on_caller+0x149/0x1a0 > [ 2008.237351] [<c0143d55>] async_synchronize_cookie_special+0xb5/0x140 > [ 2008.237362] [<c013e1a0>] autoremove_wake_function+0x0/0x40 > [ 2008.237372] [<c01a57fc>] sync_filesystems+0x6c/0x120 > [ 2008.237381] [<c017ee90>] pdflush+0x0/0x1e0 > [ 2008.237392] [<c01c0b90>] do_sync+0x20/0x60 > [ 2008.237402] [<c01c0bda>] sys_sync+0xa/0x10 > [ 2008.237412] [<c017ef9e>] pdflush+0x10e/0x1e0 > [ 2008.237420] [<c014e519>] trace_hardirqs_on_caller+0x149/0x1a0 > [ 2008.237429] [<c017de80>] laptop_flush+0x0/0x10 > [ 2008.237437] [<c013dea2>] kthread+0x42/0x70 > [ 2008.237444] [<c013de60>] kthread+0x0/0x70 > [ 2008.237452] [<c0103c03>] kernel_thread_helper+0x7/0x14 > (repeat some times) > The offender is http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=efaee192 async_synchronize_full_special() shouldn't be called while holding a spinlock, sb_lock. I think this patch should fix it. Arjan, would this work? Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> diff --git a/fs/super.c b/fs/super.c index cb20744..7d67387 100644 --- a/fs/super.c +++ b/fs/super.c @@ -458,7 +458,6 @@ void sync_filesystems(int wait) if (sb->s_flags & MS_RDONLY) continue; sb->s_need_sync_fs = 1; - async_synchronize_full_special(&sb->s_async_list); } restart: @@ -471,6 +470,7 @@ restart: sb->s_count++; spin_unlock(&sb_lock); down_read(&sb->s_umount); + async_synchronize_full_special(&sb->s_async_list); if (sb->s_root && (wait || sb->s_dirt)) sb->s_op->sync_fs(sb, wait); up_read(&sb->s_umount); -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: "BUG: scheduling while atomic: pdflush/30/0x00000002" in latest git 2009-01-08 14:37 ` "BUG: scheduling while atomic: pdflush/30/0x00000002" in latest git Dave Kleikamp @ 2009-01-08 15:21 ` Arjan van de Ven 2009-01-08 15:46 ` [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock Dave Kleikamp 2009-01-09 5:18 ` "BUG: scheduling while atomic: pdflush/30/0x00000002" in latest git Grissiom 1 sibling, 1 reply; 17+ messages in thread From: Arjan van de Ven @ 2009-01-08 15:21 UTC (permalink / raw) To: Dave Kleikamp; +Cc: Grissiom, linux-kernel, linux-fsdevel On Thu, 08 Jan 2009 08:37:52 -0600 Dave Kleikamp <shaggy@linux.vnet.ibm.com> wrote: > Adding cc:lix-fsdevel > > On Thu, 2009-01-08 at 16:07 +0800, Grissiom wrote: > > When I using the latest git version, I occasionally got this in > > dmesg: > > > > [ 2008.237234] BUG: scheduling while atomic: pdflush/30/0x00000002 > > [ 2008.237240] 2 locks held by pdflush/30: > > [ 2008.237244] #0: (mutex){--..}, at: [<c01a57a1>] > > sync_filesystems+0x11/0x120 [ 2008.237258] #1: (sb_lock){--..}, > > at: [<c01a57ab>] sync_filesystems+0x1b/0x120 > > [ 2008.237277] Modules linked in: fuse ricoh_mmc b43 > > [ 2008.237288] Pid: 30, comm: pdflush Not tainted > > 2.6.28-g14-rfkill-nophy-ledon-07485-g9e42d0c #62 > > [ 2008.237294] Call Trace: > > [ 2008.237303] [<c04d7576>] schedule+0x326/0x8e0 > > [ 2008.237311] [<c01500d3>] __lock_acquire+0x293/0xa20 > > [ 2008.237321] [<c014e307>] mark_held_locks+0x67/0x80 > > [ 2008.237330] [<c04da58c>] _spin_unlock_irqrestore+0x4c/0x60 > > [ 2008.237339] [<c014e519>] trace_hardirqs_on_caller+0x149/0x1a0 > > [ 2008.237351] [<c0143d55>] > > async_synchronize_cookie_special+0xb5/0x140 [ 2008.237362] > > [<c013e1a0>] autoremove_wake_function+0x0/0x40 [ 2008.237372] > > [<c01a57fc>] sync_filesystems+0x6c/0x120 [ 2008.237381] > > [<c017ee90>] pdflush+0x0/0x1e0 [ 2008.237392] [<c01c0b90>] > > do_sync+0x20/0x60 [ 2008.237402] [<c01c0bda>] sys_sync+0xa/0x10 > > [ 2008.237412] [<c017ef9e>] pdflush+0x10e/0x1e0 > > [ 2008.237420] [<c014e519>] trace_hardirqs_on_caller+0x149/0x1a0 > > [ 2008.237429] [<c017de80>] laptop_flush+0x0/0x10 > > [ 2008.237437] [<c013dea2>] kthread+0x42/0x70 > > [ 2008.237444] [<c013de60>] kthread+0x0/0x70 > > [ 2008.237452] [<c0103c03>] kernel_thread_helper+0x7/0x14 > > (repeat some times) > > > > The offender is > http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=efaee192 > > async_synchronize_full_special() shouldn't be called while holding a > spinlock, sb_lock. > > I think this patch should fix it. Arjan, would this work? yeah it will woops/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock 2009-01-08 15:21 ` Arjan van de Ven @ 2009-01-08 15:46 ` Dave Kleikamp 2009-01-08 22:50 ` Dave Chinner 0 siblings, 1 reply; 17+ messages in thread From: Dave Kleikamp @ 2009-01-08 15:46 UTC (permalink / raw) To: Linus Torvalds; +Cc: Grissiom, linux-kernel, linux-fsdevel, Arjan van de Ven sync_filesystems() shouldn't be calling async_synchronize_full_special while holding a spinlock. The second while loop in that function is the right place for this anyway. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Reported-by: Grissiom <chaos.proton@gmail.com> diff --git a/fs/super.c b/fs/super.c index cb20744..7d67387 100644 --- a/fs/super.c +++ b/fs/super.c @@ -458,7 +458,6 @@ void sync_filesystems(int wait) if (sb->s_flags & MS_RDONLY) continue; sb->s_need_sync_fs = 1; - async_synchronize_full_special(&sb->s_async_list); } restart: @@ -471,6 +470,7 @@ restart: sb->s_count++; spin_unlock(&sb_lock); down_read(&sb->s_umount); + async_synchronize_full_special(&sb->s_async_list); if (sb->s_root && (wait || sb->s_dirt)) sb->s_op->sync_fs(sb, wait); up_read(&sb->s_umount); -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock 2009-01-08 15:46 ` [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock Dave Kleikamp @ 2009-01-08 22:50 ` Dave Chinner 2009-01-08 22:51 ` Arjan van de Ven 2009-01-09 12:31 ` Evgeniy Polyakov 0 siblings, 2 replies; 17+ messages in thread From: Dave Chinner @ 2009-01-08 22:50 UTC (permalink / raw) To: Dave Kleikamp Cc: Linus Torvalds, Grissiom, linux-kernel, linux-fsdevel, Arjan van de Ven On Thu, Jan 08, 2009 at 09:46:31AM -0600, Dave Kleikamp wrote: > sync_filesystems() shouldn't be calling async_synchronize_full_special > while holding a spinlock. The second while loop in that function is the > right place for this anyway. Out of curiousity, what on earth does async_synchronize_full_special() do and why does it need to be in sync_filesystems()? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock 2009-01-08 22:50 ` Dave Chinner @ 2009-01-08 22:51 ` Arjan van de Ven 2009-01-09 0:32 ` Alan Cox 2009-01-09 1:40 ` Dave Chinner 2009-01-09 12:31 ` Evgeniy Polyakov 1 sibling, 2 replies; 17+ messages in thread From: Arjan van de Ven @ 2009-01-08 22:51 UTC (permalink / raw) To: Dave Kleikamp, Linus Torvalds, Grissiom, linux-kernel, linux-fsdevel <linux-fsdeve Dave Chinner wrote: > On Thu, Jan 08, 2009 at 09:46:31AM -0600, Dave Kleikamp wrote: >> sync_filesystems() shouldn't be calling async_synchronize_full_special >> while holding a spinlock. The second while loop in that function is the >> right place for this anyway. > > Out of curiousity, what on earth does > async_synchronize_full_special() do and why does it need to be in > sync_filesystems()? > now that we have asynchronous operations, this function makes sure that all the functions that we started async before this point complete, so that what when you sync, you sync all in progress work. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock 2009-01-08 22:51 ` Arjan van de Ven @ 2009-01-09 0:32 ` Alan Cox 2009-01-09 0:38 ` Arjan van de Ven 2009-01-09 1:40 ` Dave Chinner 1 sibling, 1 reply; 17+ messages in thread From: Alan Cox @ 2009-01-09 0:32 UTC (permalink / raw) To: Arjan van de Ven Cc: Dave Kleikamp, Linus Torvalds, Grissiom, linux-kernel, linux-fsdevel, Arjan van de Ven On Thu, 08 Jan 2009 14:51:52 -0800 Arjan van de Ven <arjan@linux.intel.com> wrote: > Dave Chinner wrote: > > On Thu, Jan 08, 2009 at 09:46:31AM -0600, Dave Kleikamp wrote: > >> sync_filesystems() shouldn't be calling async_synchronize_full_special > >> while holding a spinlock. The second while loop in that function is the > >> right place for this anyway. > > > > Out of curiousity, what on earth does > > async_synchronize_full_special() do and why does it need to be in > > sync_filesystems()? > > > now that we have asynchronous operations, this function makes sure that all the functions > that we started async before this point complete, so that what when you sync, you sync all in progress work. So why is it special - wouldn't async_synchronize_all() be a bit (or complete_all) be a bit more clear ? Alan ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock 2009-01-09 0:32 ` Alan Cox @ 2009-01-09 0:38 ` Arjan van de Ven 0 siblings, 0 replies; 17+ messages in thread From: Arjan van de Ven @ 2009-01-09 0:38 UTC (permalink / raw) To: Alan Cox Cc: Dave Kleikamp, Linus Torvalds, Grissiom, linux-kernel, linux-fsdevel Alan Cox wrote: > On Thu, 08 Jan 2009 14:51:52 -0800 > Arjan van de Ven <arjan@linux.intel.com> wrote: > >> Dave Chinner wrote: >>> On Thu, Jan 08, 2009 at 09:46:31AM -0600, Dave Kleikamp wrote: >>>> sync_filesystems() shouldn't be calling async_synchronize_full_special >>>> while holding a spinlock. The second while loop in that function is the >>>> right place for this anyway. >>> Out of curiousity, what on earth does >>> async_synchronize_full_special() do and why does it need to be in >>> sync_filesystems()? >>> >> now that we have asynchronous operations, this function makes sure that all the functions >> that we started async before this point complete, so that what when you sync, you sync all in progress work. > > So why is it special - wouldn't async_synchronize_all() be a bit (or > complete_all) be a bit more clear ? there is 2 types; one synchronizes the global kernel events one is special in that it only synchronizes the events you give it (via a list head) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock 2009-01-08 22:51 ` Arjan van de Ven 2009-01-09 0:32 ` Alan Cox @ 2009-01-09 1:40 ` Dave Chinner 2009-01-09 4:45 ` Arjan van de Ven 1 sibling, 1 reply; 17+ messages in thread From: Dave Chinner @ 2009-01-09 1:40 UTC (permalink / raw) To: Arjan van de Ven Cc: Dave Kleikamp, Linus Torvalds, Grissiom, linux-kernel, linux-fsdevel On Thu, Jan 08, 2009 at 02:51:52PM -0800, Arjan van de Ven wrote: > Dave Chinner wrote: >> On Thu, Jan 08, 2009 at 09:46:31AM -0600, Dave Kleikamp wrote: >>> sync_filesystems() shouldn't be calling async_synchronize_full_special >>> while holding a spinlock. The second while loop in that function is the >>> right place for this anyway. >> >> Out of curiousity, what on earth does >> async_synchronize_full_special() do and why does it need to be in >> sync_filesystems()? >> > now that we have asynchronous operations, this function makes sure that all the functions > that we started async before this point complete, so that what when you sync, you sync all in progress work. So what you are saying that we now have async operations that need magical undocumented sync primitives apparently randomly strewn throughout the code base? Could you name the interface somewhat better? it's an async operation synchronisation primitive - currently the name is full of words that are typically suffixes to something that is typically "namespace_operation_suffix". This is whole API looks like "suffix_suffix_suffix_suffix"..... <rummage around the code> Oh, fuck. You've made generic_delete_inode() an asychronous operation. This is bad. Really Bad. XFS does transactions in ->clear_inode so what you've done is removed one of the synchronous throttleѕ on inode removal that prevents build-up of massive numbers of deleted XFS inodes in memory that have run the first unlink phase but not the second. I know for a fact (because this has been done within XFS before) that this causes serious writeback, sync and unmount latency problems with XFS as well as introducing a whole new class of OOM problems when under heavy load. The depth of the async queues requires serious throttling to prevent these issues from occurring. How bad? For example, rm -rf そf a larg enumber of files can cause unmount or sync can take hours to run the async queues under heavy load because the inodes existed only in memory and are randomly spread around the filesystem (this is a real world example). In this case under heavy memory load, XFS required inode buffer RMW to do the modifcation, and given that it was on software RAID5 this also required a RAID5 RMW cycle. The result? The async inode delete queue drained at *50* deleted inodes per second. The synchronous unlink rate was around 5000 inodes per second..... So, given the potential impact of this change, what testing have you done in terms of: - performance impact - regression testing - sync() safety - removing a million files and queuing all of the deletes in the async queues.... On all the different filesystems under heavy I/O workloads? Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock 2009-01-09 1:40 ` Dave Chinner @ 2009-01-09 4:45 ` Arjan van de Ven 2009-01-09 8:22 ` Dave Chinner 2009-01-12 2:31 ` Jamie Lokier 0 siblings, 2 replies; 17+ messages in thread From: Arjan van de Ven @ 2009-01-09 4:45 UTC (permalink / raw) To: Arjan van de Ven, Dave Kleikamp, Linus Torvalds, Grissiom, linux-kernel Dave Chinner wrote: > > So, given the potential impact of this change, what testing have > you done in terms of: > > - performance impact I tested this on my machines and it gave a real performance improvement (11 to 8 seconds for a full kernel tree unlink, and cutting out latency for normal applications) > - sync() safety that was exactly the synchronization point that's discussed here. > - removing a million files and queuing all of the > deletes in the async queues.... the async code throttles at 32k outstanding. Yes 32K is arbitrary, but if you delete a million files fast, all but the first few thousand are synchronous. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock 2009-01-09 4:45 ` Arjan van de Ven @ 2009-01-09 8:22 ` Dave Chinner 2009-01-09 15:09 ` Chris Mason 2009-01-12 2:31 ` Jamie Lokier 1 sibling, 1 reply; 17+ messages in thread From: Dave Chinner @ 2009-01-09 8:22 UTC (permalink / raw) To: Arjan van de Ven Cc: Dave Kleikamp, Linus Torvalds, Grissiom, linux-kernel, linux-fsdevel On Thu, Jan 08, 2009 at 08:45:06PM -0800, Arjan van de Ven wrote: > Dave Chinner wrote: > >> >> So, given the potential impact of this change, what testing have >> you done in terms of: >> >> - performance impact > > I tested this on my machines and it gave a real performance > improvement (11 to 8 seconds for a full kernel tree unlink, and > cutting out latency for normal applications) Not really waht I'd call a significant performance test. What happens under serious sustained I/O load? On multiple different filesystems? FWIW, my current kernel tree is: $ find . -print | wc -l 30082 Which is just under the async operation limit you set so this probably didn't even stress the mixing of sync and async deletes at the same time... >> - sync() safety > that was exactly the synchronization point that's discussed here. Right - how many fs developers did you discuss the impact of this with? Did you crash test any filesystems? Did you run any fs QA suites to check for regressions? How many different filesystems did you even test? We already know that sync is busted and doesn't provide the guarantees it is supposed so on some filesystems. I'm extremely concerned that changing unlink behaviour in this manner subtly breaks sync even more than it already is.... >> - removing a million files and queuing all of the >> deletes in the async queues.... > > the async code throttles at 32k outstanding. > Yes 32K is arbitrary, but if you delete a million files fast, all > but the first few thousand are synchronous. No, after the first 32k you get a mix of synchronous and async as the async queue gets processed in parallel. This means you are screwing up the temporal locality of the unlink of inodes. i.e. some unlinks are being done immediately, others are being delayed until another 32k inodes have been processed of the unlink list. If we've been careful about allocation locality during untar so that inodes that are allocated sequentially by untar will be deleted sequential by rm -rf, then this new pattern will break those optimisations because the unlink locality is being broken by the some sync/some async behaviour that this queuing mechanism creates. Hmmmm - I suspect that rm -rf will also trigger lots of kernel threads to be created. We do not want 256 kernel threads to be spawned to bang on the serialised regions of filesystem journals (generating lock contention) when one thread would be sufficient.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock 2009-01-09 8:22 ` Dave Chinner @ 2009-01-09 15:09 ` Chris Mason 0 siblings, 0 replies; 17+ messages in thread From: Chris Mason @ 2009-01-09 15:09 UTC (permalink / raw) To: Dave Chinner Cc: Arjan van de Ven, Dave Kleikamp, Linus Torvalds, Grissiom, linux-kernel, linux-fsdevel On Fri, 2009-01-09 at 19:22 +1100, Dave Chinner wrote: > > > > the async code throttles at 32k outstanding. > > Yes 32K is arbitrary, but if you delete a million files fast, all > > but the first few thousand are synchronous. > > No, after the first 32k you get a mix of synchronous and async as > the async queue gets processed in parallel. This means you are > screwing up the temporal locality of the unlink of inodes. i.e. some > unlinks are being done immediately, others are being delayed until > another 32k inodes have been processed of the unlink list. > > > If we've been careful about allocation locality during untar so > that inodes that are allocated sequentially by untar will be > deleted sequential by rm -rf, then this new pattern will break those > optimisations because the unlink locality is being broken by > the some sync/some async behaviour that this queuing mechanism > creates. > You won't notice on ext34, but xfs, btrfs and probably jfs should go slower with this. The throttle should wait for one of the entries on the queue to finish, and then put this inode on the queue in order. Doing the current one synchronously will generate random IO. On XFS, create 256,000 4k files, drop caches and then time find . | xargs rm > Hmmmm - I suspect that rm -rf will also trigger lots of kernel > threads to be created. We do not want 256 kernel threads to > be spawned to bang on the serialised regions of filesystem > journals (generating lock contention) when one thread would > be sufficient.... This could be good and bad. -chris ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock 2009-01-09 4:45 ` Arjan van de Ven 2009-01-09 8:22 ` Dave Chinner @ 2009-01-12 2:31 ` Jamie Lokier 2009-01-12 3:54 ` Arjan van de Ven 2009-01-12 7:48 ` Dave Chinner 1 sibling, 2 replies; 17+ messages in thread From: Jamie Lokier @ 2009-01-12 2:31 UTC (permalink / raw) To: Arjan van de Ven Cc: Dave Kleikamp, Linus Torvalds, Grissiom, linux-kernel, linux-fsdevel Arjan van de Ven wrote: > > - removing a million files and queuing all of the > > deletes in the async queues.... > > the async code throttles at 32k outstanding. > Yes 32K is arbitrary, but if you delete a million files fast, all but the > first few thousand are > synchronous. Hmm. If I call unlink() a thousand times and then call fsync() on the parent directories covering files I've unlinked... I expect the deletes to be committed to disk when the last fsync() has returned. I require that a crash and restart will not see the files. Several kinds of transactional software and even some shell scripts expect this. Will these asynchronous deletes break the guaranteed commit-of-the-delete provided by fsync() on the parent directory? -- Jamie ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock 2009-01-12 2:31 ` Jamie Lokier @ 2009-01-12 3:54 ` Arjan van de Ven 2009-01-12 7:55 ` Dave Chinner 2009-01-12 7:48 ` Dave Chinner 1 sibling, 1 reply; 17+ messages in thread From: Arjan van de Ven @ 2009-01-12 3:54 UTC (permalink / raw) To: Jamie Lokier Cc: Dave Kleikamp, Linus Torvalds, Grissiom, linux-kernel, linux-fsdevel Jamie Lokier wrote: > Arjan van de Ven wrote: >>> - removing a million files and queuing all of the >>> deletes in the async queues.... >> the async code throttles at 32k outstanding. >> Yes 32K is arbitrary, but if you delete a million files fast, all but the >> first few thousand are >> synchronous. > > Hmm. > > If I call unlink() a thousand times and then call fsync() on the > parent directories covering files I've unlinked... I expect the > deletes to be committed to disk when the last fsync() has returned. I > require that a crash and restart will not see the files. Several > kinds of transactional software and even some shell scripts expect this. > > Will these asynchronous deletes break the guaranteed > commit-of-the-delete provided by fsync() on the parent directory? 3 things: 1) removing the name from the directory and removing the data from disk are independent things. The former happens from unlink(), the later happens when the refcount hits 0 (eg no more openers nor any directory on disk referencing it). fsync() on a parent dir obviously only covers the first part, while only the 2nd part was made asynchronous. 2) with the right synchronization point in fsync, it will still work out 3) this code will be redone for 2.6.30; for 2.6.29 it is removed. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock 2009-01-12 3:54 ` Arjan van de Ven @ 2009-01-12 7:55 ` Dave Chinner 0 siblings, 0 replies; 17+ messages in thread From: Dave Chinner @ 2009-01-12 7:55 UTC (permalink / raw) To: Arjan van de Ven Cc: Jamie Lokier, Dave Kleikamp, Linus Torvalds, Grissiom, linux-kernel, linux-fsdevel On Mon, Jan 12, 2009 at 03:54:41AM +0000, Arjan van de Ven wrote: > Jamie Lokier wrote: >> Arjan van de Ven wrote: >>>> - removing a million files and queuing all of the >>>> deletes in the async queues.... >>> the async code throttles at 32k outstanding. >>> Yes 32K is arbitrary, but if you delete a million files fast, all >>> but the first few thousand are >>> synchronous. >> >> Hmm. >> >> If I call unlink() a thousand times and then call fsync() on the >> parent directories covering files I've unlinked... I expect the >> deletes to be committed to disk when the last fsync() has returned. I >> require that a crash and restart will not see the files. Several >> kinds of transactional software and even some shell scripts expect this. >> >> Will these asynchronous deletes break the guaranteed >> commit-of-the-delete provided by fsync() on the parent directory? > > 3 things: > 1) removing the name from the directory and removing the data from disk are independent things. > The former happens from unlink(), the later happens when the refcount hits 0 (eg no more openers nor > any directory on disk referencing it). fsync() on a parent dir obviously only covers the first part, > while only the 2nd part was made asynchronous. > 2) with the right synchronization point in fsync, it will still work out What scope does that synchronisation point have? I sincerely hope you are not proposing to put a filesystem global synchronisation point into fsync.... > 3) this code will be redone for 2.6.30; for 2.6.29 it is removed. Can you tell use how you plan to redo this code and test it adequately for 2.6.30? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock 2009-01-12 2:31 ` Jamie Lokier 2009-01-12 3:54 ` Arjan van de Ven @ 2009-01-12 7:48 ` Dave Chinner 1 sibling, 0 replies; 17+ messages in thread From: Dave Chinner @ 2009-01-12 7:48 UTC (permalink / raw) To: Jamie Lokier Cc: Arjan van de Ven, Dave Kleikamp, Linus Torvalds, Grissiom, linux-kernel, linux-fsdevel On Mon, Jan 12, 2009 at 02:31:38AM +0000, Jamie Lokier wrote: > Arjan van de Ven wrote: > > > - removing a million files and queuing all of the > > > deletes in the async queues.... > > > > the async code throttles at 32k outstanding. > > Yes 32K is arbitrary, but if you delete a million files fast, all but the > > first few thousand are > > synchronous. > > Hmm. > > If I call unlink() a thousand times and then call fsync() on the > parent directories covering files I've unlinked... I expect the > deletes to be committed to disk when the last fsync() has returned. I > require that a crash and restart will not see the files. Several > kinds of transactional software and even some shell scripts expect this. > > Will these asynchronous deletes break the guaranteed > commit-of-the-delete provided by fsync() on the parent directory? It depends on the implementation of the filesystem you are running on. On XFS, it won't break anything because it uses a two-phase unlink. The unlink() syscall removes the inode from the namespace transactionally which means that even if you crash after the directory fsync then they will never, ever appear in the directory after recovery. In fact, recovery will see all those inodes as existing on the unlinked lists and hence the execute the second phase of the unlink during recovery. I have no idea what the behaviour of other filesystems will be but it needs to be evaluated on a filesystem-by-filesystem basis. Hmmmm. That introduces another interesting question w.r.t async unlink in XFS - it's going to hammer the on-disk unlinked list hash tables in XFS. The hashes are only 64 chains wide (per AG) so pumping thousands of inodes into them is going to increase the minimum memory footprint of XFS during rm -rf operations substantially. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock 2009-01-08 22:50 ` Dave Chinner 2009-01-08 22:51 ` Arjan van de Ven @ 2009-01-09 12:31 ` Evgeniy Polyakov 1 sibling, 0 replies; 17+ messages in thread From: Evgeniy Polyakov @ 2009-01-09 12:31 UTC (permalink / raw) To: Arjan van de Ven Cc: Linus Torvalds, Grissiom, linux-kernel, linux-fsdevel, Dave Kleikamp Hi Arjan. Likely my mail is in your blacklist that's why you did not reply on comment on the discussion about this async interface month or so ago :) Anyway, what you did for the boot process should be only there, but vfs changes have to be discussed in fsdevel. For example generic_delete_inode() does not delete inode anymore, instead it queues the work to the set of threads, serialized by the global spinlock and atomic variables, allocating additional structure for this not from the memory pool or at least memory cache, but directly via slab. This atomic allocation thus may be invoked during the cache shrink path to free some memory, which is not a good idea. So if you do insist that deletion of every inode should be async, at least shrink size of the async_event and embed it into the inode, or use memory pool. Just several other notes on the code. __async_schedule() could first check amount of pending requests and do not allocate and free the entry if all threads are busy. entry processing code run_one_entry() should not free entry under the irq-off lock. Thread start routing should check if thread successfully started before increasing number of the running thread, or this can be done in the thread callback itself. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: "BUG: scheduling while atomic: pdflush/30/0x00000002" in latest git 2009-01-08 14:37 ` "BUG: scheduling while atomic: pdflush/30/0x00000002" in latest git Dave Kleikamp 2009-01-08 15:21 ` Arjan van de Ven @ 2009-01-09 5:18 ` Grissiom 1 sibling, 0 replies; 17+ messages in thread From: Grissiom @ 2009-01-09 5:18 UTC (permalink / raw) To: Dave Kleikamp; +Cc: linux-kernel, Arjan van de Ven, linux-fsdevel On Thu, Jan 8, 2009 at 22:37, Dave Kleikamp <shaggy@linux.vnet.ibm.com> wrote: > > The offender is > http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=efaee192 > > async_synchronize_full_special() shouldn't be called while holding a > spinlock, sb_lock. > > I think this patch should fix it. Arjan, would this work? > > Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> > > diff --git a/fs/super.c b/fs/super.c > index cb20744..7d67387 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -458,7 +458,6 @@ void sync_filesystems(int wait) > if (sb->s_flags & MS_RDONLY) > continue; > sb->s_need_sync_fs = 1; > - async_synchronize_full_special(&sb->s_async_list); > } > > restart: > @@ -471,6 +470,7 @@ restart: > sb->s_count++; > spin_unlock(&sb_lock); > down_read(&sb->s_umount); > + async_synchronize_full_special(&sb->s_async_list); > if (sb->s_root && (wait || sb->s_dirt)) > sb->s_op->sync_fs(sb, wait); > up_read(&sb->s_umount); > > -- > David Kleikamp > IBM Linux Technology Center > > Problem gone with this patch applied. Thanks for the fixing. -- Cheers, Grissiom ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2009-01-12 7:55 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <a0adeea50901080007q5a3ed5c8y5a744ce37e677325@mail.gmail.com>
2009-01-08 14:37 ` "BUG: scheduling while atomic: pdflush/30/0x00000002" in latest git Dave Kleikamp
2009-01-08 15:21 ` Arjan van de Ven
2009-01-08 15:46 ` [PATCH] async: Don't call async_synchronize_full_special() while holding sb_lock Dave Kleikamp
2009-01-08 22:50 ` Dave Chinner
2009-01-08 22:51 ` Arjan van de Ven
2009-01-09 0:32 ` Alan Cox
2009-01-09 0:38 ` Arjan van de Ven
2009-01-09 1:40 ` Dave Chinner
2009-01-09 4:45 ` Arjan van de Ven
2009-01-09 8:22 ` Dave Chinner
2009-01-09 15:09 ` Chris Mason
2009-01-12 2:31 ` Jamie Lokier
2009-01-12 3:54 ` Arjan van de Ven
2009-01-12 7:55 ` Dave Chinner
2009-01-12 7:48 ` Dave Chinner
2009-01-09 12:31 ` Evgeniy Polyakov
2009-01-09 5:18 ` "BUG: scheduling while atomic: pdflush/30/0x00000002" in latest git Grissiom
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).