Chronic resource starvation.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Chronic resource starvation.
@ 2012-01-14  3:17 Mike Mestnik
  2012-01-14 20:21 ` Mike Mestnik
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Mestnik @ 2012-01-14  3:17 UTC (permalink / raw)
  To: linux-kernel

I've dealt with applications taking extended time off for a number of
years.  I typically attribute it to applications being overly zealous
about eating memory as most every application tends to do these days.  I
had always figured that there a likely plenty of ppl complaining and I
didn't want to get the boiler plat answer that resources are cheap.

I refuse to buy into the idea that Z computers can get an additional Y
resource to run application X, when instead application X could be
engineered once and for all.  This ideology is not sustainable and
eventually will crash upon it's self.  I call this Z * Y < X.  The
application source becomes the single location where every computers
resources can be increased at the cost of much less then to adjust the
running environment of every location that the code may run.

Here is a 84MB video that demonstrates the issue.
http://j.mp/wavbCO
http://bitly.com/wavbCO+

You can see at the start an application that should update regularly is
frozen and that moving windows and even focus is difficult.  The system
does recover, but after far too long.

I'll welcome any further testing as I'm sure it's needed.  Keep in mind
that I can't reproduce this reliably, but it does happen, so it will
take time to collect any amount of data...  A gkrellm plugin that
illustrates the test is the most perfered, killing two birds with one
stone as it will help with this case and many many future cases.  If
it's important wouldn't your test already be part of a tool like gkrellm?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Chronic resource starvation.
  2012-01-14  3:17 Chronic resource starvation Mike Mestnik
@ 2012-01-14 20:21 ` Mike Mestnik
  2012-01-15 12:08   ` Mike Galbraith
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Mestnik @ 2012-01-14 20:21 UTC (permalink / raw)
  To: linux-kernel

On 01/13/12 21:17, Mike Mestnik wrote:
> I've dealt with applications taking extended time off for a number of
> years.  I typically attribute it to applications being overly zealous
> about eating memory as most every application tends to do these days.  I
> had always figured that there a likely plenty of ppl complaining and I
> didn't want to get the boiler plat answer that resources are cheap.
>
> I refuse to buy into the idea that Z computers can get an additional Y
> resource to run application X, when instead application X could be
> engineered once and for all.  This ideology is not sustainable and
> eventually will crash upon it's self.  I call this Z * Y < X.  The
> application source becomes the single location where every computers
> resources can be increased at the cost of much less then to adjust the
> running environment of every location that the code may run.
>
> Here is a 84MB video that demonstrates the issue.
> http://j.mp/wavbCO
> http://bitly.com/wavbCO+
I'm glad to see a number of you have clicked on this link.

Does this behavior look normal or is it just my system?  If it is normal
how difficult would it be to make corrections and would those
corrections likely be kernel or application related?
> You can see at the start an application that should update regularly is
> frozen and that moving windows and even focus is difficult.  The system
> does recover, but after far too long.
>
> I'll welcome any further testing as I'm sure it's needed.  Keep in mind
> that I can't reproduce this reliably, but it does happen, so it will
> take time to collect any amount of data...  A gkrellm plugin that
> illustrates the test is the most perfered, killing two birds with one
> stone as it will help with this case and many many future cases.  If
> it's important wouldn't your test already be part of a tool like gkrellm?
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Chronic resource starvation.
  2012-01-14 20:21 ` Mike Mestnik
@ 2012-01-15 12:08   ` Mike Galbraith
  2012-01-15 19:01     ` Mike Mestnik
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Galbraith @ 2012-01-15 12:08 UTC (permalink / raw)
  To: Mike Mestnik; +Cc: linux-kernel

On Sat, 2012-01-14 at 14:21 -0600, Mike Mestnik wrote:
> On 01/13/12 21:17, Mike Mestnik wrote:
> > I've dealt with applications taking extended time off for a number of
> > years.  I typically attribute it to applications being overly zealous
> > about eating memory as most every application tends to do these days.  I
> > had always figured that there a likely plenty of ppl complaining and I
> > didn't want to get the boiler plat answer that resources are cheap.
> >
> > I refuse to buy into the idea that Z computers can get an additional Y
> > resource to run application X, when instead application X could be
> > engineered once and for all.  This ideology is not sustainable and
> > eventually will crash upon it's self.  I call this Z * Y < X.  The
> > application source becomes the single location where every computers
> > resources can be increased at the cost of much less then to adjust the
> > running environment of every location that the code may run.
> >
> > Here is a 84MB video that demonstrates the issue.
> > http://j.mp/wavbCO
> > http://bitly.com/wavbCO+
> I'm glad to see a number of you have clicked on this link.
> 
> Does this behavior look normal or is it just my system?  If it is normal
> how difficult would it be to make corrections and would those
> corrections likely be kernel or application related?

That "Backup complete" makes me suspect a classic case of IO-itis.  If
bits of your GUI were pushed out or ram (or weren't previously used),
and live on a disk you're beating hell out of, you get to experience
horrid interactivity while those missing bits are being retrieved.

	-Mike


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Chronic resource starvation.
  2012-01-15 12:08   ` Mike Galbraith
@ 2012-01-15 19:01     ` Mike Mestnik
  2012-01-16  2:39       ` Mike Galbraith
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Mestnik @ 2012-01-15 19:01 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel

On 01/15/12 06:08, Mike Galbraith wrote:
> On Sat, 2012-01-14 at 14:21 -0600, Mike Mestnik wrote:
>> On 01/13/12 21:17, Mike Mestnik wrote:
>>> I've dealt with applications taking extended time off for a number of
>>> years.  I typically attribute it to applications being overly zealous
>>> about eating memory as most every application tends to do these days.  I
>>> had always figured that there a likely plenty of ppl complaining and I
>>> didn't want to get the boiler plat answer that resources are cheap.
>>>
>>> I refuse to buy into the idea that Z computers can get an additional Y
>>> resource to run application X, when instead application X could be
>>> engineered once and for all.  This ideology is not sustainable and
>>> eventually will crash upon it's self.  I call this Z * Y < X.  The
>>> application source becomes the single location where every computers
>>> resources can be increased at the cost of much less then to adjust the
>>> running environment of every location that the code may run.
>>>
>>> Here is a 84MB video that demonstrates the issue.
>>> http://j.mp/wavbCO
>>> http://bitly.com/wavbCO+
>> I'm glad to see a number of you have clicked on this link.
>>
>> Does this behavior look normal or is it just my system?  If it is normal
>> how difficult would it be to make corrections and would those
>> corrections likely be kernel or application related?
> That "Backup complete" makes me suspect a classic case of IO-itis.  If
> bits of your GUI were pushed out or ram (or weren't previously used),
> and live on a disk you're beating hell out of, you get to experience
> horrid interactivity while those missing bits are being retrieved.
Thank you for the reply!

That's DVDShrink, copies from optical drive(slow) to disk(fast-ish?).
http://www.dvdshrink.info/

That wouldn't explain why the monitor(gkrellm) stopped updating, every
part of that is pooling like top would.  Gkrellm indicates disk activity
and it shows the backup complete on sr1 and a backup in progress on sr0,
however neither of these operations rate on Disk or sda.

I wouldn't suspect that 16MB/s could saturate an HD disk, perhaps I can
buffer this better?

> 	-Mike
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Chronic resource starvation.
  2012-01-15 19:01     ` Mike Mestnik
@ 2012-01-16  2:39       ` Mike Galbraith
  2012-01-16  5:43         ` Mike Mestnik
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Galbraith @ 2012-01-16  2:39 UTC (permalink / raw)
  To: Mike Mestnik; +Cc: linux-kernel

On Sun, 2012-01-15 at 13:01 -0600, Mike Mestnik wrote:
> On 01/15/12 06:08, Mike Galbraith wrote:

> >> Does this behavior look normal or is it just my system?  If it is normal
> >> how difficult would it be to make corrections and would those
> >> corrections likely be kernel or application related?
> > That "Backup complete" makes me suspect a classic case of IO-itis.  If
> > bits of your GUI were pushed out or ram (or weren't previously used),
> > and live on a disk you're beating hell out of, you get to experience
> > horrid interactivity while those missing bits are being retrieved.
> Thank you for the reply!
> 
> That's DVDShrink, copies from optical drive(slow) to disk(fast-ish?).
> http://www.dvdshrink.info/
> 
> That wouldn't explain why the monitor(gkrellm) stopped updating, every
> part of that is pooling like top would.  Gkrellm indicates disk activity
> and it shows the backup complete on sr1 and a backup in progress on sr0,
> however neither of these operations rate on Disk or sda.

No, dvdshrink wouldn't do enough IO to matter much.  Heavy IO can (or
could) cause symptoms like your video though, all of it.  I used to be
able to trigger loss of GUI control for 30 minutes at a time.  All it
took was something poking fdatasync after a large streaming IO had been
running for a while.

Another IO thing that can cause horrible interactivity is IO error
recovery.  That can (or again, could) make a box lurch like hell.  If
that were the cause, you'd have kernel gripes in dmesg output.

	-Mike


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Chronic resource starvation.
  2012-01-16  2:39       ` Mike Galbraith
@ 2012-01-16  5:43         ` Mike Mestnik
  2012-01-16  5:51           ` Mike Galbraith
  2012-01-16  6:07           ` Mike Galbraith
  0 siblings, 2 replies; 8+ messages in thread
From: Mike Mestnik @ 2012-01-16  5:43 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel

On 01/15/12 20:39, Mike Galbraith wrote:
> On Sun, 2012-01-15 at 13:01 -0600, Mike Mestnik wrote:
>> On 01/15/12 06:08, Mike Galbraith wrote:
>>>> Does this behavior look normal or is it just my system?  If it is normal
>>>> how difficult would it be to make corrections and would those
>>>> corrections likely be kernel or application related?
>>> That "Backup complete" makes me suspect a classic case of IO-itis.  If
>>> bits of your GUI were pushed out or ram (or weren't previously used),
>>> and live on a disk you're beating hell out of, you get to experience
>>> horrid interactivity while those missing bits are being retrieved.
>> Thank you for the reply!
>>
>> That's DVDShrink, copies from optical drive(slow) to disk(fast-ish?).
>> http://www.dvdshrink.info/
>>
>> That wouldn't explain why the monitor(gkrellm) stopped updating, every
>> part of that is pooling like top would.  Gkrellm indicates disk activity
>> and it shows the backup complete on sr1 and a backup in progress on sr0,
>> however neither of these operations rate on Disk or sda.
> No, dvdshrink wouldn't do enough IO to matter much.  Heavy IO can (or
> could) cause symptoms like your video though, all of it.  I used to be
> able to trigger loss of GUI control for 30 minutes at a time.  All it
> took was something poking fdatasync after a large streaming IO had been
> running for a while.
Ahhh, as I had suspected I'm not the only one.

Does fdatasync still cause this problem?  I'm sure there must be
some way to 'group' applications together that should be allowed to
avoid this effect and even insert IO requests during the operation if
given a high enough priority.
>
> Another IO thing that can cause horrible interactivity is IO error
> recovery.  That can (or again, could) make a box lurch like hell.  If
> that were the cause, you'd have kernel gripes in dmesg output.
Sure, this happens as a 'feature' of commercial DVDs and some times on
accident:

[5636213.987232] sr 4:0:0:0: [sr0] Device not ready
[5636213.987235] sr 4:0:0:0: [sr0]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[5636213.987238] sr 4:0:0:0: [sr0]  Sense Key : Not Ready [current]
[5636213.987240] sr 4:0:0:0: [sr0]  Add. Sense: Medium not present -
tray closed
[5636213.987243] sr 4:0:0:0: [sr0] CDB: Read(10): 28 00 00 00 00 00 00
00 08 00
[5636213.987248] end_request: I/O error, dev sr0, sector 0
[5636213.987251] Buffer I/O error on device sr0, logical block 0
[5636213.987253] Buffer I/O error on device sr0, logical block 1
[5636213.987260] Buffer I/O error on device sr0, logical block 2
[5636213.987261] Buffer I/O error on device sr0, logical block 3
[5636213.987263] Buffer I/O error on device sr0, logical block 4
[5636213.987265] Buffer I/O error on device sr0, logical block 5
[5636213.987266] Buffer I/O error on device sr0, logical block 6
[5636213.987267] Buffer I/O error on device sr0, logical block 7
[5636213.997003] sr 4:0:0:0: [sr0] Device not ready
[5636213.997005] sr 4:0:0:0: [sr0]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[5636213.997007] sr 4:0:0:0: [sr0]  Sense Key : Not Ready [current]
[5636213.997010] sr 4:0:0:0: [sr0]  Add. Sense: Medium not present -
tray closed
[5636213.997012] sr 4:0:0:0: [sr0] CDB: Read(10): 28 00 00 00 00 00 00
00 02 00
[5636213.997016] end_request: I/O error, dev sr0, sector 0
[5636213.997018] Buffer I/O error on device sr0, logical block 0


> 	-Mike
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Chronic resource starvation.
  2012-01-16  5:43         ` Mike Mestnik
@ 2012-01-16  5:51           ` Mike Galbraith
  2012-01-16  6:07           ` Mike Galbraith
  1 sibling, 0 replies; 8+ messages in thread
From: Mike Galbraith @ 2012-01-16  5:51 UTC (permalink / raw)
  To: Mike Mestnik; +Cc: linux-kernel

On Sun, 2012-01-15 at 23:43 -0600, Mike Mestnik wrote:

> Ahhh, as I had suspected I'm not the only one.

Far from it, IO related stalls have come up many times over the years.

> Does fdatasync still cause this problem?  I'm sure there must be
> some way to 'group' applications together that should be allowed to
> avoid this effect and even insert IO requests during the operation if
> given a high enough priority.

A lot of work was done, and it improved things a LOT, but beating hell
out of your interactive application's home has always been a bad idea,
and will likely always be so.

	-Mike


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Chronic resource starvation.
  2012-01-16  5:43         ` Mike Mestnik
  2012-01-16  5:51           ` Mike Galbraith
@ 2012-01-16  6:07           ` Mike Galbraith
  1 sibling, 0 replies; 8+ messages in thread
From: Mike Galbraith @ 2012-01-16  6:07 UTC (permalink / raw)
  To: Mike Mestnik; +Cc: linux-kernel

On Sun, 2012-01-15 at 23:43 -0600, Mike Mestnik wrote:

> Does fdatasync still cause this problem?  I'm sure there must be
> some way to 'group' applications together that should be allowed to
> avoid this effect and even insert IO requests during the operation if
> given a high enough priority.

And btw, that resource prioritization exists now for both CPU and IO
resources, look into cgroups and ionice for non-cgroup IO.  For the CPU
scheduler, you can automate task grouping by enabling SCHED_AUTOGROUP,
and some distros use a userland automation solution via systemd.  I
don't know if systemd does IO group automation, but groups won't help
defective spinning media anyway.

	-Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-01-16  6:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-14  3:17 Chronic resource starvation Mike Mestnik
2012-01-14 20:21 ` Mike Mestnik
2012-01-15 12:08   ` Mike Galbraith
2012-01-15 19:01     ` Mike Mestnik
2012-01-16  2:39       ` Mike Galbraith
2012-01-16  5:43         ` Mike Mestnik
2012-01-16  5:51           ` Mike Galbraith
2012-01-16  6:07           ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox