* hung_task checking and sys_sync
@ 2012-06-12 22:09 Daniel Walker
2012-06-12 22:29 ` Mandeep Singh Baines
0 siblings, 1 reply; 8+ messages in thread
From: Daniel Walker @ 2012-06-12 22:09 UTC (permalink / raw)
To: fweisbec; +Cc: msb, sshaiju, mingo, akpm, linux-kernel
I found this commit which was a while ago,
commit fb822db465bd9fd4208eef1af4490539b236c54e
Author: Ingo Molnar <mingo@elte.hu>
Date: Wed Aug 20 11:17:40 2008 +0200
softlockup: increase hung tasks check from 2 minutes to 8 minutes
Andrew says:
> Seems that about 100% of the reports we get of this warning triggering
> are sys_sync, transaction commit, etc.
increase the timeout. If it still triggers for people, we can kill it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We're seeing these messages on an older kernel (montavista) but the code areas
appear similar to current kernels. The issue is that we're doing a file copy
which takes 10-15minutes, and in the background there is a "df --sync"
happening (which is calling sys_sync). We end up getting a hung task message
like below,
INFO: task df:1778 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ffffffff81578d40 0000000000000086 ffff8801f6135b00 ffff880269a91800
ffff880269a91800 ffff8802702be000 ffff8801f602a080 0000000000000000
ffff8801f602a440 ffffffff8109c166 ffff8801e863de18 0000000000000004
Call Trace:
[<ffffffff8109c166>] ? sync_page+0x0/0x49
[<ffffffff81320de2>] ? __schedule+0x3c/0x57
[<ffffffff810ea3c7>] ? bdi_sched_wait+0x0/0xe
[<ffffffff81320de2>] ? __schedule+0x3c/0x57
[<ffffffff81320e0d>] ? schedule+0x10/0x1e
[<ffffffff810ea3d0>] ? bdi_sched_wait+0x9/0xe
There some variation in the stack trace , but always thru bdi_sched_wait().
These don't seem like valid warnings, since the copy happening is know to take
a long time. Has there been any commit that disable these messages bdi_sched_wait?
Daniel
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: hung_task checking and sys_sync 2012-06-12 22:09 hung_task checking and sys_sync Daniel Walker @ 2012-06-12 22:29 ` Mandeep Singh Baines 2012-06-12 22:34 ` Daniel Walker 0 siblings, 1 reply; 8+ messages in thread From: Mandeep Singh Baines @ 2012-06-12 22:29 UTC (permalink / raw) To: Daniel Walker; +Cc: fweisbec, msb, sshaiju, mingo, akpm, linux-kernel Daniel Walker (dwalker@fifo99.com) wrote: > Hi Daniel, > I found this commit which was a while ago, > > commit fb822db465bd9fd4208eef1af4490539b236c54e > Author: Ingo Molnar <mingo@elte.hu> > Date: Wed Aug 20 11:17:40 2008 +0200 > > softlockup: increase hung tasks check from 2 minutes to 8 minutes > > Andrew says: > > > Seems that about 100% of the reports we get of this warning triggering > > are sys_sync, transaction commit, etc. > > increase the timeout. If it still triggers for people, we can kill it. > > Signed-off-by: Ingo Molnar <mingo@elte.hu> > > > We're seeing these messages on an older kernel (montavista) but the code areas > appear similar to current kernels. The issue is that we're doing a file copy > which takes 10-15minutes, and in the background there is a "df --sync" > happening (which is calling sys_sync). We end up getting a hung task message > like below, > > INFO: task df:1778 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > ffffffff81578d40 0000000000000086 ffff8801f6135b00 ffff880269a91800 > ffff880269a91800 ffff8802702be000 ffff8801f602a080 0000000000000000 > ffff8801f602a440 ffffffff8109c166 ffff8801e863de18 0000000000000004 > Call Trace: > [<ffffffff8109c166>] ? sync_page+0x0/0x49 > [<ffffffff81320de2>] ? __schedule+0x3c/0x57 > [<ffffffff810ea3c7>] ? bdi_sched_wait+0x0/0xe > [<ffffffff81320de2>] ? __schedule+0x3c/0x57 > [<ffffffff81320e0d>] ? schedule+0x10/0x1e > [<ffffffff810ea3d0>] ? bdi_sched_wait+0x9/0xe > > There some variation in the stack trace , but always thru bdi_sched_wait(). > > > These don't seem like valid warnings, since the copy happening is know to take > a long time. But the time is not unbounded. You could mask the hung_task_detector for this case but then you lose the ability to catch bugs in this code path. The timeout is configurable via /proc/sys/kernel/hung_task_timeout_secs. Can you bump up the value at boot via sysctl.conf? > Has there been any commit that disable these messages bdi_sched_wait? > No. There is no mechanism to disable hung_task for a specific code path. We do skip processes if PF_PROZEN or PF_FROZEN_SKIP is set but that is really a different situation where the wait is unbounded. Regards, Mandeep > Daniel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: hung_task checking and sys_sync 2012-06-12 22:29 ` Mandeep Singh Baines @ 2012-06-12 22:34 ` Daniel Walker 2012-06-12 22:45 ` Mandeep Baines 0 siblings, 1 reply; 8+ messages in thread From: Daniel Walker @ 2012-06-12 22:34 UTC (permalink / raw) To: Mandeep Singh Baines; +Cc: fweisbec, msb, sshaiju, mingo, akpm, linux-kernel On Tue, Jun 12, 2012 at 03:29:12PM -0700, Mandeep Singh Baines wrote: > > But the time is not unbounded. You could mask the hung_task_detector for > this case but then you lose the ability to catch bugs in this code path. > > The timeout is configurable via /proc/sys/kernel/hung_task_timeout_secs. > Can you bump up the value at boot via sysctl.conf? Maybe, but I'm wondering if these types should just be stopped because Andrew had complained about them already. > > Has there been any commit that disable these messages bdi_sched_wait? > > > > No. There is no mechanism to disable hung_task for a specific code path. > We do skip processes if PF_PROZEN or PF_FROZEN_SKIP is set but that is > really a different situation where the wait is unbounded. There is presidence for this type of change, Author: Mark Lord <kernel@teksavvy.com> Date: Fri Sep 24 09:51:13 2010 -0400 block: Prevent hang_check firing during long I/O During long I/O operations, the hang_check timer may fire, trigger stack dumps that unnecessarily alarm the user. Eg. hdparm --security-erase NULL /dev/sdb ## can take *hours* to complete So, if hang_check is armed, we should wake up periodically to prevent it from triggering. This patch uses a wake-up interval equal to half the hang_check timer period, which keeps overhead low enough. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: hung_task checking and sys_sync 2012-06-12 22:34 ` Daniel Walker @ 2012-06-12 22:45 ` Mandeep Baines 2012-06-12 22:57 ` Daniel Walker 2012-06-13 3:02 ` Sadasivan Shaiju 0 siblings, 2 replies; 8+ messages in thread From: Mandeep Baines @ 2012-06-12 22:45 UTC (permalink / raw) To: Daniel Walker; +Cc: fweisbec, sshaiju, mingo, akpm, linux-kernel On Tue, Jun 12, 2012 at 3:34 PM, Daniel Walker <dwalker@fifo99.com> wrote: > On Tue, Jun 12, 2012 at 03:29:12PM -0700, Mandeep Singh Baines wrote: >> >> But the time is not unbounded. You could mask the hung_task_detector for >> this case but then you lose the ability to catch bugs in this code path. >> >> The timeout is configurable via /proc/sys/kernel/hung_task_timeout_secs. >> Can you bump up the value at boot via sysctl.conf? > > Maybe, but I'm wondering if these types should just be stopped because Andrew > had complained about them already. > Fair enough. Actually, internally I had a patch where we'd use a task flag to disable and enable the hang check but the approach in the patch you pointed me to seems better. >> > Has there been any commit that disable these messages bdi_sched_wait? >> > >> >> No. There is no mechanism to disable hung_task for a specific code path. >> We do skip processes if PF_PROZEN or PF_FROZEN_SKIP is set but that is >> really a different situation where the wait is unbounded. > > There is presidence for this type of change, > > Author: Mark Lord <kernel@teksavvy.com> > Date: Fri Sep 24 09:51:13 2010 -0400 > > block: Prevent hang_check firing during long I/O > > During long I/O operations, the hang_check timer may fire, > trigger stack dumps that unnecessarily alarm the user. > > Eg. hdparm --security-erase NULL /dev/sdb ## can take *hours* to complete > > So, if hang_check is armed, we should wake up periodically > to prevent it from triggering. This patch uses a wake-up interval > equal to half the hang_check timer period, which keeps overhead low enough. > > Signed-off-by: Mark Lord <mlord@pobox.com> > Signed-off-by: Jens Axboe <jaxboe@fusionio.com> > Interesting. I wasn't aware of this patch. Maybe we could abstract this approach via wait_for_completion_no_hang_check(). Regards, Mandeep ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: hung_task checking and sys_sync 2012-06-12 22:45 ` Mandeep Baines @ 2012-06-12 22:57 ` Daniel Walker 2012-06-13 1:03 ` Muthu Kumar 2012-06-13 3:02 ` Sadasivan Shaiju 1 sibling, 1 reply; 8+ messages in thread From: Daniel Walker @ 2012-06-12 22:57 UTC (permalink / raw) To: Mandeep Baines; +Cc: fweisbec, sshaiju, mingo, akpm, linux-kernel On Tue, Jun 12, 2012 at 03:45:20PM -0700, Mandeep Baines wrote: > On Tue, Jun 12, 2012 at 3:34 PM, Daniel Walker <dwalker@fifo99.com> wrote: > > On Tue, Jun 12, 2012 at 03:29:12PM -0700, Mandeep Singh Baines wrote: > >> > >> But the time is not unbounded. You could mask the hung_task_detector for > >> this case but then you lose the ability to catch bugs in this code path. > >> > >> The timeout is configurable via /proc/sys/kernel/hung_task_timeout_secs. > >> Can you bump up the value at boot via sysctl.conf? > > > > Maybe, but I'm wondering if these types should just be stopped because Andrew > > had complained about them already. > > > > Fair enough. Actually, internally I had a patch where we'd use a task > flag to disable and enable the hang check but the approach in the > patch you pointed me to seems better. I'm not really in love with it actually.. It's not ifdef'd for one, but it's also changing potentially good kernel behavior to avoid warnings. > >> > Has there been any commit that disable these messages bdi_sched_wait? > >> > > >> > >> No. There is no mechanism to disable hung_task for a specific code path. > >> We do skip processes if PF_PROZEN or PF_FROZEN_SKIP is set but that is > >> really a different situation where the wait is unbounded. > > > > There is presidence for this type of change, > > > > Author: Mark Lord <kernel@teksavvy.com> > > Date: Fri Sep 24 09:51:13 2010 -0400 > > > > block: Prevent hang_check firing during long I/O > > > > During long I/O operations, the hang_check timer may fire, > > trigger stack dumps that unnecessarily alarm the user. > > > > Eg. hdparm --security-erase NULL /dev/sdb ## can take *hours* to complete > > > > So, if hang_check is armed, we should wake up periodically > > to prevent it from triggering. This patch uses a wake-up interval > > equal to half the hang_check timer period, which keeps overhead low enough. > > > > Signed-off-by: Mark Lord <mlord@pobox.com> > > Signed-off-by: Jens Axboe <jaxboe@fusionio.com> > > > > Interesting. I wasn't aware of this patch. Maybe we could abstract > this approach via wait_for_completion_no_hang_check(). Could be .. You could put a stack structure into a list of tasks that should be ignored prior to the task sleeping. Then when the thread wakes the stack structure could be removed. Then that list get checked during the hung task checking. Daniel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: hung_task checking and sys_sync 2012-06-12 22:57 ` Daniel Walker @ 2012-06-13 1:03 ` Muthu Kumar 2012-06-13 15:43 ` Daniel Walker 0 siblings, 1 reply; 8+ messages in thread From: Muthu Kumar @ 2012-06-13 1:03 UTC (permalink / raw) To: Daniel Walker Cc: Mandeep Baines, fweisbec, sshaiju, mingo, akpm, linux-kernel On Tue, Jun 12, 2012 at 3:57 PM, Daniel Walker <dwalker@fifo99.com> wrote: > On Tue, Jun 12, 2012 at 03:45:20PM -0700, Mandeep Baines wrote: >> On Tue, Jun 12, 2012 at 3:34 PM, Daniel Walker <dwalker@fifo99.com> wrote: >> > On Tue, Jun 12, 2012 at 03:29:12PM -0700, Mandeep Singh Baines wrote: >> >> >> >> But the time is not unbounded. You could mask the hung_task_detector for >> >> this case but then you lose the ability to catch bugs in this code path. >> >> >> >> The timeout is configurable via /proc/sys/kernel/hung_task_timeout_secs. >> >> Can you bump up the value at boot via sysctl.conf? >> > >> > Maybe, but I'm wondering if these types should just be stopped because Andrew >> > had complained about them already. >> > >> >> Fair enough. Actually, internally I had a patch where we'd use a task >> flag to disable and enable the hang check but the approach in the >> patch you pointed me to seems better. > > I'm not really in love with it actually.. It's not ifdef'd for one, but > it's also changing potentially good kernel behavior to avoid warnings. > I totally agree with you (but, not the ifdef part :). The mentioned change actually was masking a potential problem - see https://lkml.org/lkml/2012/6/6/483. If not for that change, we would have got hung task message for the case where blk_execute_req() would have stuck forever without the completion being called. >> >> > Has there been any commit that disable these messages bdi_sched_wait? >> >> > >> >> >> >> No. There is no mechanism to disable hung_task for a specific code path. >> >> We do skip processes if PF_PROZEN or PF_FROZEN_SKIP is set but that is >> >> really a different situation where the wait is unbounded. >> > >> > There is presidence for this type of change, >> > >> > Author: Mark Lord <kernel@teksavvy.com> >> > Date: Fri Sep 24 09:51:13 2010 -0400 >> > >> > block: Prevent hang_check firing during long I/O >> > >> > During long I/O operations, the hang_check timer may fire, >> > trigger stack dumps that unnecessarily alarm the user. >> > >> > Eg. hdparm --security-erase NULL /dev/sdb ## can take *hours* to complete >> > >> > So, if hang_check is armed, we should wake up periodically >> > to prevent it from triggering. This patch uses a wake-up interval >> > equal to half the hang_check timer period, which keeps overhead low enough. >> > >> > Signed-off-by: Mark Lord <mlord@pobox.com> >> > Signed-off-by: Jens Axboe <jaxboe@fusionio.com> >> > >> >> Interesting. I wasn't aware of this patch. Maybe we could abstract >> this approach via wait_for_completion_no_hang_check(). > > Could be .. You could put a stack structure into a list of tasks that > should be ignored prior to the task sleeping. Then when the thread wakes > the stack structure could be removed. Then that list get checked > during the hung task checking. > > Daniel > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: hung_task checking and sys_sync 2012-06-13 1:03 ` Muthu Kumar @ 2012-06-13 15:43 ` Daniel Walker 0 siblings, 0 replies; 8+ messages in thread From: Daniel Walker @ 2012-06-13 15:43 UTC (permalink / raw) To: Muthu Kumar; +Cc: Mandeep Baines, fweisbec, sshaiju, mingo, akpm, linux-kernel On Tue, Jun 12, 2012 at 06:03:20PM -0700, Muthu Kumar wrote: > On Tue, Jun 12, 2012 at 3:57 PM, Daniel Walker <dwalker@fifo99.com> wrote: > > On Tue, Jun 12, 2012 at 03:45:20PM -0700, Mandeep Baines wrote: > >> On Tue, Jun 12, 2012 at 3:34 PM, Daniel Walker <dwalker@fifo99.com> wrote: > >> > On Tue, Jun 12, 2012 at 03:29:12PM -0700, Mandeep Singh Baines wrote: > >> >> > >> >> But the time is not unbounded. You could mask the hung_task_detector for > >> >> this case but then you lose the ability to catch bugs in this code path. > >> >> > >> >> The timeout is configurable via /proc/sys/kernel/hung_task_timeout_secs. > >> >> Can you bump up the value at boot via sysctl.conf? > >> > > >> > Maybe, but I'm wondering if these types should just be stopped because Andrew > >> > had complained about them already. > >> > > >> > >> Fair enough. Actually, internally I had a patch where we'd use a task > >> flag to disable and enable the hang check but the approach in the > >> patch you pointed me to seems better. > > > > I'm not really in love with it actually.. It's not ifdef'd for one, but > > it's also changing potentially good kernel behavior to avoid warnings. > > > I totally agree with you (but, not the ifdef part :). The mentioned > change actually was masking a potential problem - see > https://lkml.org/lkml/2012/6/6/483. If not for that change, we would > have got hung task message for the case where blk_execute_req() would > have stuck forever without the completion being called. Not sure how the link you gave relates here. The hang checker isn't always part of the kernel i.e. it's configurable. So this fix doesn't always need to exist, which is what I mean by ifdef'd .. Daniel ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: hung_task checking and sys_sync 2012-06-12 22:45 ` Mandeep Baines 2012-06-12 22:57 ` Daniel Walker @ 2012-06-13 3:02 ` Sadasivan Shaiju 1 sibling, 0 replies; 8+ messages in thread From: Sadasivan Shaiju @ 2012-06-13 3:02 UTC (permalink / raw) To: Mandeep Baines, Daniel Walker; +Cc: fweisbec, mingo, akpm, linux-kernel There was another patch addressing these type of issue . https://lkml.org/lkml/2009/1/12/18 regards, shaiju. -----Original Message----- From: Mandeep Baines [mailto:msb@google.com] Sent: Tuesday, June 12, 2012 3:45 PM To: Daniel Walker Cc: fweisbec@gmail.com; sshaiju@mvista.com; mingo@elte.hu; akpm@linux-foundation.org; linux-kernel@vger.kernel.org Subject: Re: hung_task checking and sys_sync On Tue, Jun 12, 2012 at 3:34 PM, Daniel Walker <dwalker@fifo99.com> wrote: > On Tue, Jun 12, 2012 at 03:29:12PM -0700, Mandeep Singh Baines wrote: >> >> But the time is not unbounded. You could mask the hung_task_detector >> for this case but then you lose the ability to catch bugs in this code path. >> >> The timeout is configurable via /proc/sys/kernel/hung_task_timeout_secs. >> Can you bump up the value at boot via sysctl.conf? > > Maybe, but I'm wondering if these types should just be stopped because > Andrew had complained about them already. > Fair enough. Actually, internally I had a patch where we'd use a task flag to disable and enable the hang check but the approach in the patch you pointed me to seems better. >> > Has there been any commit that disable these messages bdi_sched_wait? >> > >> >> No. There is no mechanism to disable hung_task for a specific code path. >> We do skip processes if PF_PROZEN or PF_FROZEN_SKIP is set but that >> is really a different situation where the wait is unbounded. > > There is presidence for this type of change, > > Author: Mark Lord <kernel@teksavvy.com> > Date: Fri Sep 24 09:51:13 2010 -0400 > > block: Prevent hang_check firing during long I/O > > During long I/O operations, the hang_check timer may fire, > trigger stack dumps that unnecessarily alarm the user. > > Eg. hdparm --security-erase NULL /dev/sdb ## can take *hours* to > complete > > So, if hang_check is armed, we should wake up periodically > to prevent it from triggering. This patch uses a wake-up interval > equal to half the hang_check timer period, which keeps overhead low enough. > > Signed-off-by: Mark Lord <mlord@pobox.com> > Signed-off-by: Jens Axboe <jaxboe@fusionio.com> > Interesting. I wasn't aware of this patch. Maybe we could abstract this approach via wait_for_completion_no_hang_check(). Regards, Mandeep ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-06-13 15:44 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-06-12 22:09 hung_task checking and sys_sync Daniel Walker 2012-06-12 22:29 ` Mandeep Singh Baines 2012-06-12 22:34 ` Daniel Walker 2012-06-12 22:45 ` Mandeep Baines 2012-06-12 22:57 ` Daniel Walker 2012-06-13 1:03 ` Muthu Kumar 2012-06-13 15:43 ` Daniel Walker 2012-06-13 3:02 ` Sadasivan Shaiju
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox