* xfs_iomap_write_unwritten stuck in congestion_wait?
@ 2013-04-03 19:33 Peter Watkins
2013-04-04 4:00 ` Dave Chinner
0 siblings, 1 reply; 4+ messages in thread
From: Peter Watkins @ 2013-04-03 19:33 UTC (permalink / raw)
To: xfs
Hello,
Wondering if anyone has a suggestion for when
xfs_iomap_write_unwritten gets into congestion_wait.
In this case the system has almost half of normal zone pages in
NR_WRITEBACK with pretty much everybody held up in either
congestion_wait or balance_dirty_pages.
Since there are some free pages, seems like we'd be better off just
using a little more memory to finish this IO and in turn reduce pages
under write-back and add to free memory, rather than holding up here.
So maybe PF_MEMALLOC?
It also looks like this path allocates log vectors with KM_SLEEP but
lv_buf's with KM_SLEEP|KM_NOFS. Why is that?
PID: 7011 TASK: ffff880226282040 CPU: 2 COMMAND: "xfsconvertd/2"
#0 [ffff88022629b550] schedule at ffffffff814f5862
#1 [ffff88022629b618] schedule_timeout at ffffffff814f66a2
#2 [ffff88022629b6c8] io_schedule_timeout at ffffffff814f532f
#3 [ffff88022629b6f8] congestion_wait at ffffffff81137450
#4 [ffff88022629b758] throttle_vm_writeout at ffffffff81128c78
#5 [ffff88022629b798] shrink_zone at ffffffff8112ea3b
#6 [ffff88022629b848] do_try_to_free_pages at ffffffff8112ecfe
#7 [ffff88022629b8d8] try_to_free_pages at ffffffff8112f30d
#8 [ffff88022629b988] __alloc_pages_nodemask at ffffffff81126797
#9 [ffff88022629ba98] kmem_getpages at ffffffff8115db12
#10 [ffff88022629bac8] fallback_alloc at ffffffff8115e72a
#11 [ffff88022629bb48] ____cache_alloc_node at ffffffff8115e4a9
#12 [ffff88022629bba8] __kmalloc at ffffffff8115f0d9
#13 [ffff88022629bbf8] kmem_alloc at ffffffffa02d69f7
#14 [ffff88022629bc38] xfs_log_commit_cil at ffffffffa02c3ebd
#15 [ffff88022629bcb8] _xfs_trans_commit at ffffffffa02cfe99
#16 [ffff88022629bd18] xfs_iomap_write_unwritten at ffffffffa02bce01
#17 [ffff88022629be18] xfs_end_io at ffffffffa02d72bb
#18 [ffff88022629be38] worker_thread at ffffffff8108c6a0
#19 [ffff88022629bee8] kthread at ffffffff81091ca6
#20 [ffff88022629bf48] kernel_thread at ffffffff8100c14a
Apologies in advance, this is an older kernel (2.6.32-279) but has
many more recent patches (thank-you!)
-Peter
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: xfs_iomap_write_unwritten stuck in congestion_wait?
2013-04-03 19:33 xfs_iomap_write_unwritten stuck in congestion_wait? Peter Watkins
@ 2013-04-04 4:00 ` Dave Chinner
2013-04-04 15:50 ` Peter Watkins
0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2013-04-04 4:00 UTC (permalink / raw)
To: Peter Watkins; +Cc: xfs
On Wed, Apr 03, 2013 at 03:33:11PM -0400, Peter Watkins wrote:
> Hello,
>
> Wondering if anyone has a suggestion for when
> xfs_iomap_write_unwritten gets into congestion_wait.
Do less IO?
> In this case the system has almost half of normal zone pages in
> NR_WRITEBACK with pretty much everybody held up in either
> congestion_wait or balance_dirty_pages.
Which is excessive - how are you getting to the point of having that
many pages under IO at once? Writeback depth is limited by the IO
elevator queue depths, so this shouldn't happen unless you've been
tweaking block device parameters (i.e. nr_requests/max_sectors_kb)...
> Since there are some free pages, seems like we'd be better off just
> using a little more memory to finish this IO and in turn reduce pages
> under write-back and add to free memory, rather than holding up here.
> So maybe PF_MEMALLOC?
Definitely not. Unwritten extent conversion can require hundreds of
kilobytes of memory to complete, so all this will do is trigger even
further exhaustion of memory reserves before we block on IO.
> It also looks like this path allocates log vectors with KM_SLEEP but
> lv_buf's with KM_SLEEP|KM_NOFS. Why is that?
The transaction commit is copying the changes made into separate
buffers to insert into the CIL for a later checkpoint to write to
disk. This is normal behaviour - we can sleep there, but we cannot
allow memory reclaim to recurse into the filesystem (for obvious
reasons).
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: xfs_iomap_write_unwritten stuck in congestion_wait?
2013-04-04 4:00 ` Dave Chinner
@ 2013-04-04 15:50 ` Peter Watkins
2013-04-04 20:25 ` Dave Chinner
0 siblings, 1 reply; 4+ messages in thread
From: Peter Watkins @ 2013-04-04 15:50 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Thu, Apr 4, 2013 at 12:00 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Wed, Apr 03, 2013 at 03:33:11PM -0400, Peter Watkins wrote:
>> Hello,
>>
>> Wondering if anyone has a suggestion for when
>> xfs_iomap_write_unwritten gets into congestion_wait.
>
> Do less IO?
>
>> In this case the system has almost half of normal zone pages in
>> NR_WRITEBACK with pretty much everybody held up in either
>> congestion_wait or balance_dirty_pages.
>
> Which is excessive - how are you getting to the point of having that
> many pages under IO at once? Writeback depth is limited by the IO
> elevator queue depths, so this shouldn't happen unless you've been
> tweaking block device parameters (i.e. nr_requests/max_sectors_kb)...
>
>> Since there are some free pages, seems like we'd be better off just
>> using a little more memory to finish this IO and in turn reduce pages
>> under write-back and add to free memory, rather than holding up here.
>> So maybe PF_MEMALLOC?
>
> Definitely not. Unwritten extent conversion can require hundreds of
> kilobytes of memory to complete, so all this will do is trigger even
> further exhaustion of memory reserves before we block on IO.
>
>> It also looks like this path allocates log vectors with KM_SLEEP but
>> lv_buf's with KM_SLEEP|KM_NOFS. Why is that?
>
> The transaction commit is copying the changes made into separate
> buffers to insert into the CIL for a later checkpoint to write to
> disk. This is normal behaviour - we can sleep there, but we cannot
> allow memory reclaim to recurse into the filesystem (for obvious
> reasons).
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
Thanks for the help.
There are other clues the VM system was rather quickly overwhelmed,
i.e. it couldn't even get bdi flush threads started without sending
threadd into congestion_wait.
So indeed there is a big multi-threaded writer which starts all at
once, and that can be smoothed out.
And nr_requests is dialed up from 128 to 1024. Is anyone really able
to resist that temptation?
-Peter
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: xfs_iomap_write_unwritten stuck in congestion_wait?
2013-04-04 15:50 ` Peter Watkins
@ 2013-04-04 20:25 ` Dave Chinner
0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2013-04-04 20:25 UTC (permalink / raw)
To: Peter Watkins; +Cc: xfs
On Thu, Apr 04, 2013 at 11:50:15AM -0400, Peter Watkins wrote:
> On Thu, Apr 4, 2013 at 12:00 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Wed, Apr 03, 2013 at 03:33:11PM -0400, Peter Watkins wrote:
> >> Hello,
> >>
> >> Wondering if anyone has a suggestion for when
> >> xfs_iomap_write_unwritten gets into congestion_wait.
> >
> > Do less IO?
> >
> >> In this case the system has almost half of normal zone pages in
> >> NR_WRITEBACK with pretty much everybody held up in either
> >> congestion_wait or balance_dirty_pages.
> >
> > Which is excessive - how are you getting to the point of having that
> > many pages under IO at once? Writeback depth is limited by the IO
> > elevator queue depths, so this shouldn't happen unless you've been
> > tweaking block device parameters (i.e. nr_requests/max_sectors_kb)...
> >
> >> Since there are some free pages, seems like we'd be better off just
> >> using a little more memory to finish this IO and in turn reduce pages
> >> under write-back and add to free memory, rather than holding up here.
> >> So maybe PF_MEMALLOC?
> >
> > Definitely not. Unwritten extent conversion can require hundreds of
> > kilobytes of memory to complete, so all this will do is trigger even
> > further exhaustion of memory reserves before we block on IO.
> >
> >> It also looks like this path allocates log vectors with KM_SLEEP but
> >> lv_buf's with KM_SLEEP|KM_NOFS. Why is that?
> >
> > The transaction commit is copying the changes made into separate
> > buffers to insert into the CIL for a later checkpoint to write to
> > disk. This is normal behaviour - we can sleep there, but we cannot
> > allow memory reclaim to recurse into the filesystem (for obvious
> > reasons).
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
>
> Thanks for the help.
>
> There are other clues the VM system was rather quickly overwhelmed,
> i.e. it couldn't even get bdi flush threads started without sending
> threadd into congestion_wait.
Yeah, that's a sure sign that you'ev overloaded the system with
dirty pages.
> So indeed there is a big multi-threaded writer which starts all at
> once, and that can be smoothed out.
>
> And nr_requests is dialed up from 128 to 1024. Is anyone really able
> to resist that temptation?
I haven't had to do this on a system to get decent write performance
for years. And in general, the deepest IO parallelism you can get
from SAS/SCSI/FC hardware devices is around 240 IOs, so going depper
than that doesn't buy you a whole lot except for queuing up lots of
IO and causing high IO latencie.
FWIW, on HW RAID the BBWC is where all the significant IO
aggregation and reordering takes place, not the IO elevator. The
BBWC has a much bigger window for reordering than the elevator, and
doesn't cause any nasty interactions with the VM by being large...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-04-04 20:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-03 19:33 xfs_iomap_write_unwritten stuck in congestion_wait? Peter Watkins
2013-04-04 4:00 ` Dave Chinner
2013-04-04 15:50 ` Peter Watkins
2013-04-04 20:25 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox