linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: mathieu lacage <mathieu.lacage@alcmeon.com>
Cc: jack@suse.cz,
	Olivier Chapelliere <olivier.chapelliere@alcmeon.com>,
	linux-fsdevel@vger.kernel.org
Subject: Re: stuck in inotify_release
Date: Tue, 14 May 2019 17:44:05 +0200	[thread overview]
Message-ID: <20190514154405.GA30401@quack2.suse.cz> (raw)
In-Reply-To: <CAC8Mkjy=igiQatSVXNXphjyzGn2faZ75XZZGANWOtt3hvwk8DA@mail.gmail.com>

Hi!

On Tue 14-05-19 16:35:29, mathieu lacage wrote:
> We are going to setup a new ubuntu 16.04 server, rebuild a vanilla 5.0
> kernel on that and run a fraction of our production workload on that. Is
> this ok for you ? If so, I will let you know as soon as we observe the
> problem on this server again.

Yes, that should rule out any Ubuntu specific problems thanks!

								Honza

> 
> Mathieu
> 
> Le mar. 14 mai 2019 à 15:22, Olivier Chapelliere <
> olivier.chapelliere@alcmeon.com> a écrit :
> 
> > ---------- Forwarded message ---------
> > From: Jan Kara <jack@suse.cz>
> > Date: Tue, May 14, 2019 at 11:25 AM
> > Subject: Re: stuck in inotify_release
> > To: Olivier Chapelliere <olivier.chapelliere@alcmeon.com>
> > Cc: Jan Kara <jack@suse.cz>, <linux-fsdevel@vger.kernel.org>
> >
> >
> > Hello!
> >
> > On Mon 06-05-19 20:54:24, Olivier Chapelliere wrote:
> > > It finally took a month to happen again : python processes watching a
> > > directory are stuck in inotify_release.
> > > I ran the sysrq commands as you requested and attached the result.
> >
> > Thanks. I was looking into these traces but the situation is the same as
> > before. Everyone is blocked waiting for inotify group to shut down. That is
> > blocked waiting for worker to finish destroying notification marks and the
> > worker is blocked in synchronize_srcu() waiting for SRCU grace period to
> > end. Now I didn't find any process that would be holding the SRCU lock so
> > it seems that someone exited the SRCU locked section without releasing the
> > lock. I've checked 4.15 your Ubuntu kernel is based on and I don't see how
> > that would be possible. It it possible though, that the problem is
> > introduced by some Ubuntu specific backports. Would it be possible for you
> > to run some vanilla kernel (i.e., without Ubuntu modifications)?
> >
> >                                                                 Honza
> >
> > > On Thu, Mar 28, 2019 at 10:52 AM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > Hello,
> > > >
> > > > On Thu 28-03-19 09:26:45, Olivier Chapelliere wrote:
> > > > > According to what I read on internet you seem to be the right person
> > to get
> > > > > in touch with when one has problems with inotify.
> > > >
> > > > Yes, there's also linux-fsdevel@vger.kernel.org mailing list which we
> > use
> > > > (added to CC).
> > > >
> > > > > We are monitoring several directories in python processes through
> > inotify.
> > > > > But after few days all processes are stuck in a call to
> > inotify_release.
> > > > > Once I detected the problem, I dumped info to dmesg with
> > sysrq-trigger
> > > > > (dmesg content attached):
> > > > > echo w > /proc/sysrq-trigger
> > > >
> > > > Looking through the stack traces, all of them wait in fput() ->
> > > > inotify_release() -> ... -> fsnotify_wait_marks_destroyed() ->
> > > > flush_delayed_work(&reaper_work). So they wait for worker process to
> > > > destroy all marks for the group. However that worker (kworker/u8:4) is
> > > > stuck in:
> > > >
> > > > fsnotify_mark_destroy_workfn() -> synchronize_srcu(&fsnotify_mark_srcu)
> > > >
> > > > So the question is who is holding fsnotify_mark_srcu so that SRCU
> > cannot
> > > > declare new grace period. I don't see any such process among the
> > processes
> > > > you've shown in the dump (but it should be there) so it's a bit of a
> > > > mystery.
> > > >
> > > > > Our production env is ubuntu 18.04 kernel 4.15 fs ext4
> > > > > This problem appears on a weekly basis so I will be able to run
> > additional
> > > > > commands to track down the issue if needed.
> > > >
> > > > So when this happens again, try grabbing output of sysrq-l and sysrq-t
> > if
> > > > we can find the task holding fsnotify_mark_srcu.
> > > >
> > > >                                                                 Honza
> > > > --
> > > > Jan Kara <jack@suse.com>
> > > > SUSE Labs, CR
> > >
> > >
> > >
> > > --
> > > Olivier Chapelliere
> >
> >
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
> >
> >
> > --
> > Olivier Chapelliere
> >
> 
> 
> -- 
> Mathieu Lacage <mathieu.lacage@alcmeon.com>
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

      parent reply	other threads:[~2019-05-14 15:44 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CANp+0hhbsegocrx-MK0DS=Qx4DfivB27nSKHrukiFAY6x6cJQA@mail.gmail.com>
2019-03-28  9:52 ` stuck in inotify_release Jan Kara
2019-05-06 18:54   ` Olivier Chapelliere
2019-05-14  6:22     ` Olivier Chapelliere
2019-05-14  9:25     ` Jan Kara
     [not found]       ` <CANp+0hiZt=oEWMUqRC-pv9=8JnvSyPcpDCf+O5whth1C_q0jNA@mail.gmail.com>
     [not found]         ` <CAC8Mkjy=igiQatSVXNXphjyzGn2faZ75XZZGANWOtt3hvwk8DA@mail.gmail.com>
2019-05-14 15:44           ` Jan Kara [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190514154405.GA30401@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mathieu.lacage@alcmeon.com \
    --cc=olivier.chapelliere@alcmeon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).