Re: process hangs on do_exit when oom happens

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Michal Hocko <mhocko@suse.cz>
To: Qiang Gao <gaoqiangscut@gmail.com>
Cc: Balbir Singh <bsingharora@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: process hangs on do_exit when oom happens
Date: Tue, 23 Oct 2012 11:50:28 +0200	[thread overview]
Message-ID: <20121023095028.GD15397@dhcp22.suse.cz> (raw)
In-Reply-To: <CAKWKT+bYOf0cEDuiibf6eV2raMxe481y-D+nrBgPWR3R+53zvg@mail.gmail.com>

On Tue 23-10-12 15:18:48, Qiang Gao wrote:
> This process was moved to RT-priority queue when global oom-killer
> happened to boost the recovery of the system..

Who did that? oom killer doesn't boost the priority (scheduling class)
AFAIK.

> but it wasn't get properily dealt with. I still have no idea why where
> the problem is ..

Well your configuration says that there is no runtime reserved for the
group.
Please refer to Documentation/scheduler/sched-rt-group.txt for more
information.

> On Tue, Oct 23, 2012 at 12:40 PM, Balbir Singh <bsingharora@gmail.com> wrote:
> > On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao <gaoqiangscut@gmail.com> wrote:
> >> information about the system is in the attach file "information.txt"
> >>
> >> I can not reproduce it in the upstream 3.6.0 kernel..
> >>
> >> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko <mhocko@suse.cz> wrote:
> >>> On Wed 17-10-12 18:23:34, gaoqiang wrote:
> >>>> I looked up nothing useful with google,so I'm here for help..
> >>>>
> >>>> when this happens:  I use memcg to limit the memory use of a
> >>>> process,and when the memcg cgroup was out of memory,
> >>>> the process was oom-killed   however,it cannot really complete the
> >>>> exiting. here is the some information
> >>>
> >>> How many tasks are in the group and what kind of memory do they use?
> >>> Is it possible that you were hit by the same issue as described in
> >>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
> >>>
> >>>> OS version:  centos6.2    2.6.32.220.7.1
> >>>
> >>> Your kernel is quite old and you should be probably asking your
> >>> distribution to help you out. There were many fixes since 2.6.32.
> >>> Are you able to reproduce the same issue with the current vanila kernel?
> >>>
> >>>> /proc/pid/stack
> >>>> ---------------------------------------------------------------
> >>>>
> >>>> [<ffffffff810597ca>] __cond_resched+0x2a/0x40
> >>>> [<ffffffff81121569>] unmap_vmas+0xb49/0xb70
> >>>> [<ffffffff8112822e>] exit_mmap+0x7e/0x140
> >>>> [<ffffffff8105b078>] mmput+0x58/0x110
> >>>> [<ffffffff81061aad>] exit_mm+0x11d/0x160
> >>>> [<ffffffff81061c9d>] do_exit+0x1ad/0x860
> >>>> [<ffffffff81062391>] do_group_exit+0x41/0xb0
> >>>> [<ffffffff81077cd8>] get_signal_to_deliver+0x1e8/0x430
> >>>> [<ffffffff8100a4c4>] do_notify_resume+0xf4/0x8b0
> >>>> [<ffffffff8100b281>] int_signal+0x12/0x17
> >>>> [<ffffffffffffffff>] 0xffffffffffffffff
> >>>
> >>> This looks strange because this is just an exit part which shouldn't
> >>> deadlock or anything. Is this stack stable? Have you tried to take check
> >>> it more times?
> >
> > Looking at information.txt, I found something interesting
> >
> > rt_rq[0]:/1314
> >   .rt_nr_running                 : 1
> >   .rt_throttled                  : 1
> >   .rt_time                       : 0.856656
> >   .rt_runtime                    : 0.000000
> >
> >
> > cfs_rq[0]:/1314
> >   .exec_clock                    : 8738.133429
> >   .MIN_vruntime                  : 0.000001
> >   .min_vruntime                  : 8739.371271
> >   .max_vruntime                  : 0.000001
> >   .spread                        : 0.000000
> >   .spread0                       : -9792.255554
> >   .nr_spread_over                : 1
> >   .nr_running                    : 0
> >   .load                          : 0
> >   .load_avg                      : 7376.722880
> >   .load_period                   : 7.203830
> >   .load_contrib                  : 1023
> >   .load_tg                       : 1023
> >   .se->exec_start                : 282004.715064
> >   .se->vruntime                  : 18435.664560
> >   .se->sum_exec_runtime          : 8738.133429
> >   .se->wait_start                : 0.000000
> >   .se->sleep_start               : 0.000000
> >   .se->block_start               : 0.000000
> >   .se->sleep_max                 : 0.000000
> >   .se->block_max                 : 0.000000
> >   .se->exec_max                  : 77.977054
> >   .se->slice_max                 : 0.000000
> >   .se->wait_max                  : 2.664779
> >   .se->wait_sum                  : 29.970575
> >   .se->wait_count                : 102
> >   .se->load.weight               : 2
> >
> > So 1314 is a real time process and
> >
> > cpu.rt_period_us:
> > 1000000
> > ----------------------
> > cpu.rt_runtime_us:
> > 0
> >
> > When did tt move to being a Real Time process (hint: see nr_running
> > and nr_throttled)?
> >
> > Balbir
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Michal Hocko <mhocko@suse.cz>
To: Qiang Gao <gaoqiangscut@gmail.com>
Cc: Balbir Singh <bsingharora@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: process hangs on do_exit when oom happens
Date: Tue, 23 Oct 2012 11:50:28 +0200	[thread overview]
Message-ID: <20121023095028.GD15397@dhcp22.suse.cz> (raw)
In-Reply-To: <CAKWKT+bYOf0cEDuiibf6eV2raMxe481y-D+nrBgPWR3R+53zvg@mail.gmail.com>

On Tue 23-10-12 15:18:48, Qiang Gao wrote:
> This process was moved to RT-priority queue when global oom-killer
> happened to boost the recovery of the system..

Who did that? oom killer doesn't boost the priority (scheduling class)
AFAIK.

> but it wasn't get properily dealt with. I still have no idea why where
> the problem is ..

Well your configuration says that there is no runtime reserved for the
group.
Please refer to Documentation/scheduler/sched-rt-group.txt for more
information.

> On Tue, Oct 23, 2012 at 12:40 PM, Balbir Singh <bsingharora@gmail.com> wrote:
> > On Tue, Oct 23, 2012 at 9:05 AM, Qiang Gao <gaoqiangscut@gmail.com> wrote:
> >> information about the system is in the attach file "information.txt"
> >>
> >> I can not reproduce it in the upstream 3.6.0 kernel..
> >>
> >> On Sat, Oct 20, 2012 at 12:04 AM, Michal Hocko <mhocko@suse.cz> wrote:
> >>> On Wed 17-10-12 18:23:34, gaoqiang wrote:
> >>>> I looked up nothing useful with google,so I'm here for help..
> >>>>
> >>>> when this happens:  I use memcg to limit the memory use of a
> >>>> process,and when the memcg cgroup was out of memory,
> >>>> the process was oom-killed   however,it cannot really complete the
> >>>> exiting. here is the some information
> >>>
> >>> How many tasks are in the group and what kind of memory do they use?
> >>> Is it possible that you were hit by the same issue as described in
> >>> 79dfdacc memcg: make oom_lock 0 and 1 based rather than counter.
> >>>
> >>>> OS version:  centos6.2    2.6.32.220.7.1
> >>>
> >>> Your kernel is quite old and you should be probably asking your
> >>> distribution to help you out. There were many fixes since 2.6.32.
> >>> Are you able to reproduce the same issue with the current vanila kernel?
> >>>
> >>>> /proc/pid/stack
> >>>> ---------------------------------------------------------------
> >>>>
> >>>> [<ffffffff810597ca>] __cond_resched+0x2a/0x40
> >>>> [<ffffffff81121569>] unmap_vmas+0xb49/0xb70
> >>>> [<ffffffff8112822e>] exit_mmap+0x7e/0x140
> >>>> [<ffffffff8105b078>] mmput+0x58/0x110
> >>>> [<ffffffff81061aad>] exit_mm+0x11d/0x160
> >>>> [<ffffffff81061c9d>] do_exit+0x1ad/0x860
> >>>> [<ffffffff81062391>] do_group_exit+0x41/0xb0
> >>>> [<ffffffff81077cd8>] get_signal_to_deliver+0x1e8/0x430
> >>>> [<ffffffff8100a4c4>] do_notify_resume+0xf4/0x8b0
> >>>> [<ffffffff8100b281>] int_signal+0x12/0x17
> >>>> [<ffffffffffffffff>] 0xffffffffffffffff
> >>>
> >>> This looks strange because this is just an exit part which shouldn't
> >>> deadlock or anything. Is this stack stable? Have you tried to take check
> >>> it more times?
> >
> > Looking at information.txt, I found something interesting
> >
> > rt_rq[0]:/1314
> >   .rt_nr_running                 : 1
> >   .rt_throttled                  : 1
> >   .rt_time                       : 0.856656
> >   .rt_runtime                    : 0.000000
> >
> >
> > cfs_rq[0]:/1314
> >   .exec_clock                    : 8738.133429
> >   .MIN_vruntime                  : 0.000001
> >   .min_vruntime                  : 8739.371271
> >   .max_vruntime                  : 0.000001
> >   .spread                        : 0.000000
> >   .spread0                       : -9792.255554
> >   .nr_spread_over                : 1
> >   .nr_running                    : 0
> >   .load                          : 0
> >   .load_avg                      : 7376.722880
> >   .load_period                   : 7.203830
> >   .load_contrib                  : 1023
> >   .load_tg                       : 1023
> >   .se->exec_start                : 282004.715064
> >   .se->vruntime                  : 18435.664560
> >   .se->sum_exec_runtime          : 8738.133429
> >   .se->wait_start                : 0.000000
> >   .se->sleep_start               : 0.000000
> >   .se->block_start               : 0.000000
> >   .se->sleep_max                 : 0.000000
> >   .se->block_max                 : 0.000000
> >   .se->exec_max                  : 77.977054
> >   .se->slice_max                 : 0.000000
> >   .se->wait_max                  : 2.664779
> >   .se->wait_sum                  : 29.970575
> >   .se->wait_count                : 102
> >   .se->load.weight               : 2
> >
> > So 1314 is a real time process and
> >
> > cpu.rt_period_us:
> > 1000000
> > ----------------------
> > cpu.rt_runtime_us:
> > 0
> >
> > When did tt move to being a Real Time process (hint: see nr_running
> > and nr_throttled)?
> >
> > Balbir
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs

next prev parent reply	other threads:[~2012-10-23  9:50 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-17 10:23 process hangs on do_exit when oom happens gaoqiang
2012-10-19 16:04 ` Michal Hocko
2012-10-19 16:04   ` Michal Hocko
2012-10-22  2:16   ` Qiang Gao
     [not found]     ` <CAKWKT+Z-SZb1=3rwLm+urs3fghQ3M6pdOR_rzXKCevoad11a5g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-22  5:38       ` Balbir Singh
2012-10-22  5:38         ` Balbir Singh
2012-10-22  5:38         ` Balbir Singh
2012-10-22 13:01     ` Michal Hocko
2012-10-22 13:01       ` Michal Hocko
2012-10-22  4:26   ` Qiang Gao
2012-10-22  4:26     ` Qiang Gao
2012-10-23  3:35   ` Qiang Gao
2012-10-23  4:40     ` Balbir Singh
2012-10-23  4:40       ` Balbir Singh
     [not found]       ` <CAKTCnzkMQQXRdx=ikydsD9Pm3LuRgf45_=m7ozuFmSZyxazXyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-23  7:18         ` Qiang Gao
2012-10-23  7:18           ` Qiang Gao
2012-10-23  7:18           ` Qiang Gao
2012-10-23  9:50           ` Michal Hocko [this message]
2012-10-23  9:50             ` Michal Hocko
     [not found]             ` <20121023095028.GD15397-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-10-23 10:10               ` Qiang Gao
2012-10-23 10:10                 ` Qiang Gao
2012-10-23 10:10                 ` Qiang Gao
2012-10-23 10:15                 ` Michal Hocko
2012-10-23 10:15                   ` Michal Hocko
2012-10-23 17:43                   ` Balbir Singh
2012-10-23 17:43                     ` Balbir Singh
2012-10-24  3:44                     ` Qiang Gao
2012-10-24  3:44                       ` Qiang Gao
2012-10-25  9:57                       ` Michal Hocko
2012-10-25  9:57                         ` Michal Hocko
2012-10-26  2:42                         ` Qiang Gao
2012-10-26  2:42                           ` Qiang Gao
     [not found]                           ` <CAKWKT+ZRTUwer8qhjWGjkra63e10R67UQzezdaCaStz+rvGjxw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-26 17:03                             ` Mike Galbraith
2012-10-26 17:03                               ` Mike Galbraith
2012-10-26 17:03                               ` Mike Galbraith
     [not found]                               ` <1351270990.16639.92.camel-sZ+7a5bGyC/1wTEvPJ5Q0F6hYfS7NtTn@public.gmane.org>
2012-10-26 20:04                                 ` Mike Galbraith
2012-10-26 20:04                                   ` Mike Galbraith
2012-10-26 20:04                                   ` Mike Galbraith
2012-10-23  8:35     ` Michal Hocko
2012-10-23  8:35       ` Michal Hocko
2012-10-23  9:08       ` Qiang Gao
2012-10-23  9:08         ` Qiang Gao
2012-10-23  9:43         ` Michal Hocko
2012-10-23  9:43           ` Michal Hocko
     [not found]     ` <CAKWKT+ZRMHzgCLJ1quGnw-_T1b9OboYKnQdRc2_Z=rdU_PFVtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-23  9:01       ` Sha Zhengju
2012-10-23  9:01         ` Sha Zhengju
2012-10-23  9:01         ` Sha Zhengju
2012-10-23  9:10         ` Qiang Gao
2012-10-23  9:10           ` Qiang Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121023095028.GD15397@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=gaoqiangscut@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-mmc@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.