linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [2.6.36-rc1] unmount livelock due to racing with bdi-flusher threads
@ 2010-08-21  8:41 Dave Chinner
  2010-09-13  2:41 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2010-08-21  8:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel

Folks,

I just had an umount take a very long time burning a CPU the entire
time. It wasn't the unmount thread, either, it was the the bdi
flusher thread for the the filesystem being unmounted. It was
spinning with this perf top trace:

           553144.00 76.9% writeback_inodes_wb  [kernel.kallsyms]
           106434.00 14.8% __ticket_spin_lock   [kernel.kallsyms]
            25646.00  3.6% __ticket_spin_unlock [kernel.kallsyms]
            10512.00  1.5% _raw_spin_lock       [kernel.kallsyms]
             9606.00  1.3% put_super            [kernel.kallsyms]
             7920.00  1.1% __put_super          [kernel.kallsyms]
             5592.00  0.8% down_read_trylock    [kernel.kallsyms]
               46.00  0.0% kfree                [kernel.kallsyms]
               22.00  0.0% __do_softirq         [kernel.kallsyms]
               19.00  0.0% wb_writeback         [kernel.kallsyms]
               16.00  0.0% wb_do_writeback      [kernel.kallsyms]
                8.00  0.0% queue_io             [kernel.kallsyms]
                6.00  0.0% run_timer_softirq    [kernel.kallsyms]
                6.00  0.0% local_bh_enable_ip   [kernel.kallsyms]

This went on for ~7m25s (according to the pmchart trace I had on
screen) before something broke the livelock by writing the inodes to
disk (maybe the xfssyncd) and the unmount then completed a couple
of seconds later.

>From the above profile, I'm assuming that writeback_inodes_wb() was
seeing pin_sb_for_writeback(sb) failing and moving dirty inodes from
the the b_io to the b_more_io list, then being called again,
splicing the inodes on b_more_io back to b_io, and then failed again
to pin_sb_for_writeback() for each inode, moving them back to the
b_more_io list....

This is on 2.6.36-rc1 + the radix tree fixes for writeback.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [2.6.36-rc1] unmount livelock due to racing with bdi-flusher threads
  2010-08-21  8:41 [2.6.36-rc1] unmount livelock due to racing with bdi-flusher threads Dave Chinner
@ 2010-09-13  2:41 ` Dave Chinner
  2010-09-30 21:02   ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2010-09-13  2:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel

ping?

On Sat, Aug 21, 2010 at 06:41:26PM +1000, Dave Chinner wrote:
> Folks,
> 
> I just had an umount take a very long time burning a CPU the entire
> time. It wasn't the unmount thread, either, it was the the bdi
> flusher thread for the the filesystem being unmounted. It was
> spinning with this perf top trace:
> 
>            553144.00 76.9% writeback_inodes_wb  [kernel.kallsyms]
>            106434.00 14.8% __ticket_spin_lock   [kernel.kallsyms]
>             25646.00  3.6% __ticket_spin_unlock [kernel.kallsyms]
>             10512.00  1.5% _raw_spin_lock       [kernel.kallsyms]
>              9606.00  1.3% put_super            [kernel.kallsyms]
>              7920.00  1.1% __put_super          [kernel.kallsyms]
>              5592.00  0.8% down_read_trylock    [kernel.kallsyms]
>                46.00  0.0% kfree                [kernel.kallsyms]
>                22.00  0.0% __do_softirq         [kernel.kallsyms]
>                19.00  0.0% wb_writeback         [kernel.kallsyms]
>                16.00  0.0% wb_do_writeback      [kernel.kallsyms]
>                 8.00  0.0% queue_io             [kernel.kallsyms]
>                 6.00  0.0% run_timer_softirq    [kernel.kallsyms]
>                 6.00  0.0% local_bh_enable_ip   [kernel.kallsyms]
> 
> This went on for ~7m25s (according to the pmchart trace I had on
> screen) before something broke the livelock by writing the inodes to
> disk (maybe the xfssyncd) and the unmount then completed a couple
> of seconds later.
> 
> From the above profile, I'm assuming that writeback_inodes_wb() was
> seeing pin_sb_for_writeback(sb) failing and moving dirty inodes from
> the the b_io to the b_more_io list, then being called again,
> splicing the inodes on b_more_io back to b_io, and then failed again
> to pin_sb_for_writeback() for each inode, moving them back to the
> b_more_io list....
> 
> This is on 2.6.36-rc1 + the radix tree fixes for writeback.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [2.6.36-rc1] unmount livelock due to racing with bdi-flusher threads
  2010-09-13  2:41 ` Dave Chinner
@ 2010-09-30 21:02   ` Jan Kara
  2010-10-01  2:59     ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2010-09-30 21:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, linux-fsdevel, hch

On Mon 13-09-10 12:41:28, Dave Chinner wrote:
> ping?
  Pong ;) I finally had a look at this. Thanks for reporting this.

> > I just had an umount take a very long time burning a CPU the entire
> > time. It wasn't the unmount thread, either, it was the the bdi
> > flusher thread for the the filesystem being unmounted. It was
> > spinning with this perf top trace:
> > 
> >            553144.00 76.9% writeback_inodes_wb  [kernel.kallsyms]
> >            106434.00 14.8% __ticket_spin_lock   [kernel.kallsyms]
> >             25646.00  3.6% __ticket_spin_unlock [kernel.kallsyms]
> >             10512.00  1.5% _raw_spin_lock       [kernel.kallsyms]
> >              9606.00  1.3% put_super            [kernel.kallsyms]
> >              7920.00  1.1% __put_super          [kernel.kallsyms]
> >              5592.00  0.8% down_read_trylock    [kernel.kallsyms]
> >                46.00  0.0% kfree                [kernel.kallsyms]
> >                22.00  0.0% __do_softirq         [kernel.kallsyms]
> >                19.00  0.0% wb_writeback         [kernel.kallsyms]
> >                16.00  0.0% wb_do_writeback      [kernel.kallsyms]
> >                 8.00  0.0% queue_io             [kernel.kallsyms]
> >                 6.00  0.0% run_timer_softirq    [kernel.kallsyms]
> >                 6.00  0.0% local_bh_enable_ip   [kernel.kallsyms]
> > 
> > This went on for ~7m25s (according to the pmchart trace I had on
> > screen) before something broke the livelock by writing the inodes to
> > disk (maybe the xfssyncd) and the unmount then completed a couple
> > of seconds later.
> > 
> > From the above profile, I'm assuming that writeback_inodes_wb() was
> > seeing pin_sb_for_writeback(sb) failing and moving dirty inodes from
> > the the b_io to the b_more_io list, then being called again,
> > splicing the inodes on b_more_io back to b_io, and then failed again
> > to pin_sb_for_writeback() for each inode, moving them back to the
> > b_more_io list....
> > 
> > This is on 2.6.36-rc1 + the radix tree fixes for writeback.
  Indeed, your analysis looks correct. The trouble is following:

  Flusher thread                           Umount
- start processing background writeback
					   - get s_mount for writing
					   - queue syncing work for flusher
					   - waits until flusher thread
					     gets to it
- loops infinitely, trying to get s_umount
  for reading

In principle a classical ABBA deadlock. Actually, there are more
complicated (and harder to hit) cases like:

  Flusher thread	  Sync				Remount
- processes background
  writeback
			  - gets s_umount for reading
			  - queues syncing work
			  - waits for syncing work
							- tries to get
							  s_umount for writing
							  and blocks
- now loops infinitely
  since it cannot get
  s_umount for reading anymore

The question is how to properly resolve it. The cases like the second one
above show that it's not enough to just somehow work-around writeback
during umount. Also it's not only background writeback that can get
deadlocked like this but generally anything submitted via
__bdi_start_writeback (as these kinds of writeback do not have superblock
specified).

I think the best resolution of this problem would be to change the work
that is submitted via bdi_start_writeback() (i.e., the work without
superblock = work which needs to do locking) to "target based scheme" like
Christoph wanted already some time ago. I actually have a patch to do this
for background writeback so I will just modify it to apply to a wider range
of writeback as well. Or Christoph, do you already have some patches in
this direction?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [2.6.36-rc1] unmount livelock due to racing with bdi-flusher threads
  2010-09-30 21:02   ` Jan Kara
@ 2010-10-01  2:59     ` Christoph Hellwig
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2010-10-01  2:59 UTC (permalink / raw)
  To: Jan Kara; +Cc: Dave Chinner, linux-kernel, linux-fsdevel

On Thu, Sep 30, 2010 at 11:02:51PM +0200, Jan Kara wrote:
> I think the best resolution of this problem would be to change the work
> that is submitted via bdi_start_writeback() (i.e., the work without
> superblock = work which needs to do locking) to "target based scheme" like
> Christoph wanted already some time ago. I actually have a patch to do this
> for background writeback so I will just modify it to apply to a wider range
> of writeback as well. Or Christoph, do you already have some patches in
> this direction?

I've not started any work on it yet.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-10-01  2:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-21  8:41 [2.6.36-rc1] unmount livelock due to racing with bdi-flusher threads Dave Chinner
2010-09-13  2:41 ` Dave Chinner
2010-09-30 21:02   ` Jan Kara
2010-10-01  2:59     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).