linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] inotify_add_watch/inotify_rm_watch loops trigger oom
@ 2016-02-14  8:35 Eryu Guan
  2016-02-14 14:39 ` Jeff Layton
  0 siblings, 1 reply; 6+ messages in thread
From: Eryu Guan @ 2016-02-14  8:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: jlayton, jack

[-- Attachment #1: Type: text/plain, Size: 680 bytes --]

Hi,

Starting from v4.5-rc1 running inotify_add_watch/inotify_rm_watch in
loop could trigger OOM and system becomes unusuable. v4.4 kernel is fine
with the same stress test.

Reverting c510eff6beba ("fsnotify: destroy marks with call_srcu instead
of dedicated thread") on top of v4.5-rc3 passed the same test, seems
that this patch introduced some kind of memleak?

On v4.5-rc[1-3] the test program triggers oom within 10 minutes on my
test vm with 8G mem.  After reverting the commit in question the same vm
survived more than 1 hour stress test.

	./inotify <mnt>

I attached the test program and oom console log. If more information is
needed please let me know.

Thanks,
Eryu

[-- Attachment #2: inotify.c --]
[-- Type: text/plain, Size: 505 bytes --]

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/inotify.h>

int main(int argc, char *argv[])
{
	int notify_fd;
	int wd, ret;

	notify_fd = inotify_init();

	for (;;) {
		wd = inotify_add_watch(notify_fd, argv[1], IN_MODIFY | IN_CREATE | IN_DELETE );
		if (wd < 0) {
			perror("inotify_add_watch");
		}
		ret = inotify_rm_watch(notify_fd, wd);
		if (ret < 0) {
			perror("inotify_rm_watch");
		}
	}
out:
	close(notify_fd);

	return 0;
}

[-- Attachment #3: inotify-oom.log.bz2 --]
[-- Type: application/x-bzip2, Size: 7155 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] inotify_add_watch/inotify_rm_watch loops trigger oom
  2016-02-14  8:35 [BUG] inotify_add_watch/inotify_rm_watch loops trigger oom Eryu Guan
@ 2016-02-14 14:39 ` Jeff Layton
  2016-02-15  1:02   ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff Layton @ 2016-02-14 14:39 UTC (permalink / raw)
  To: Eryu Guan; +Cc: linux-fsdevel, jack, Paul E. McKenney, akpm, torvalds, jack

On Sun, 14 Feb 2016 16:35:43 +0800
Eryu Guan <guaneryu@gmail.com> wrote:

> Hi,
> 
> Starting from v4.5-rc1 running inotify_add_watch/inotify_rm_watch in
> loop could trigger OOM and system becomes unusuable. v4.4 kernel is fine
> with the same stress test.
> 
> Reverting c510eff6beba ("fsnotify: destroy marks with call_srcu instead
> of dedicated thread") on top of v4.5-rc3 passed the same test, seems
> that this patch introduced some kind of memleak?
> 
> On v4.5-rc[1-3] the test program triggers oom within 10 minutes on my
> test vm with 8G mem.  After reverting the commit in question the same vm
> survived more than 1 hour stress test.
> 
> 	./inotify <mnt>
> 
> I attached the test program and oom console log. If more information is
> needed please let me know.
> 
> Thanks,
> Eryu

Thanks Eryu, I think I see what the problem is. This reproducer is
creating and deleting marks very rapidly. But the SRCU code has this:

    #define SRCU_CALLBACK_BATCH     10                                              
    #define SRCU_INTERVAL           1                                               

So, process_srcu will only process 10 entries at a time, and only once
per jiffy. The upshot there is that that reproducer can create entries
_much_ faster than they can be cleaned up now that we're using
call_srcu in this codepath. If you kill the program before the OOM
killer kicks in, they all eventually get cleaned up but it does take a
while (minutes).

I clearly didn't educate myself enough as to the limitations of
call_srcu before converting this code over to use it (and I missed
Paul's subtle hints in that regard). We may need to revert that patch
before v4.5 ships, but I'd like to ponder it for a few days to and see
whether there is some way to batch them up so that they get reaped more
efficiently without requiring the dedicated thread.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] inotify_add_watch/inotify_rm_watch loops trigger oom
  2016-02-14 14:39 ` Jeff Layton
@ 2016-02-15  1:02   ` Paul E. McKenney
  2016-02-15  1:24     ` Jeff Layton
  2016-02-15  2:33     ` Linus Torvalds
  0 siblings, 2 replies; 6+ messages in thread
From: Paul E. McKenney @ 2016-02-15  1:02 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Eryu Guan, linux-fsdevel, jack, akpm, torvalds

On Sun, Feb 14, 2016 at 09:39:31AM -0500, Jeff Layton wrote:
> On Sun, 14 Feb 2016 16:35:43 +0800
> Eryu Guan <guaneryu@gmail.com> wrote:
> 
> > Hi,
> > 
> > Starting from v4.5-rc1 running inotify_add_watch/inotify_rm_watch in
> > loop could trigger OOM and system becomes unusuable. v4.4 kernel is fine
> > with the same stress test.
> > 
> > Reverting c510eff6beba ("fsnotify: destroy marks with call_srcu instead
> > of dedicated thread") on top of v4.5-rc3 passed the same test, seems
> > that this patch introduced some kind of memleak?
> > 
> > On v4.5-rc[1-3] the test program triggers oom within 10 minutes on my
> > test vm with 8G mem.  After reverting the commit in question the same vm
> > survived more than 1 hour stress test.
> > 
> > 	./inotify <mnt>
> > 
> > I attached the test program and oom console log. If more information is
> > needed please let me know.
> > 
> > Thanks,
> > Eryu
> 
> Thanks Eryu, I think I see what the problem is. This reproducer is
> creating and deleting marks very rapidly. But the SRCU code has this:
> 
>     #define SRCU_CALLBACK_BATCH     10                                              
>     #define SRCU_INTERVAL           1                                               
> 
> So, process_srcu will only process 10 entries at a time, and only once
> per jiffy. The upshot there is that that reproducer can create entries
> _much_ faster than they can be cleaned up now that we're using
> call_srcu in this codepath. If you kill the program before the OOM
> killer kicks in, they all eventually get cleaned up but it does take a
> while (minutes).
> 
> I clearly didn't educate myself enough as to the limitations of
> call_srcu before converting this code over to use it (and I missed
> Paul's subtle hints in that regard). We may need to revert that patch
> before v4.5 ships, but I'd like to ponder it for a few days to and see
> whether there is some way to batch them up so that they get reaped more
> efficiently without requiring the dedicated thread.

One thought would be to add an "emergency mode" to SRCU similar to that
already in RCU.  Something to the effect that if the current list of
callbacks is going to take more than a second to drain at the configured
per-jiffy rate, just process them without waiting.

Would that help in this case, or am I missing something about the
reproducer?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] inotify_add_watch/inotify_rm_watch loops trigger oom
  2016-02-15  1:02   ` Paul E. McKenney
@ 2016-02-15  1:24     ` Jeff Layton
  2016-02-15  2:33     ` Linus Torvalds
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2016-02-15  1:24 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Eryu Guan, linux-fsdevel, jack, akpm, torvalds

On Sun, 14 Feb 2016 17:02:38 -0800
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Sun, Feb 14, 2016 at 09:39:31AM -0500, Jeff Layton wrote:
> > On Sun, 14 Feb 2016 16:35:43 +0800
> > Eryu Guan <guaneryu@gmail.com> wrote:
> >   
> > > Hi,
> > > 
> > > Starting from v4.5-rc1 running inotify_add_watch/inotify_rm_watch in
> > > loop could trigger OOM and system becomes unusuable. v4.4 kernel is fine
> > > with the same stress test.
> > > 
> > > Reverting c510eff6beba ("fsnotify: destroy marks with call_srcu instead
> > > of dedicated thread") on top of v4.5-rc3 passed the same test, seems
> > > that this patch introduced some kind of memleak?
> > > 
> > > On v4.5-rc[1-3] the test program triggers oom within 10 minutes on my
> > > test vm with 8G mem.  After reverting the commit in question the same vm
> > > survived more than 1 hour stress test.
> > > 
> > > 	./inotify <mnt>
> > > 
> > > I attached the test program and oom console log. If more information is
> > > needed please let me know.
> > > 
> > > Thanks,
> > > Eryu  
> > 
> > Thanks Eryu, I think I see what the problem is. This reproducer is
> > creating and deleting marks very rapidly. But the SRCU code has this:
> > 
> >     #define SRCU_CALLBACK_BATCH     10                                              
> >     #define SRCU_INTERVAL           1                                               
> > 
> > So, process_srcu will only process 10 entries at a time, and only once
> > per jiffy. The upshot there is that that reproducer can create entries
> > _much_ faster than they can be cleaned up now that we're using
> > call_srcu in this codepath. If you kill the program before the OOM
> > killer kicks in, they all eventually get cleaned up but it does take a
> > while (minutes).
> > 
> > I clearly didn't educate myself enough as to the limitations of
> > call_srcu before converting this code over to use it (and I missed
> > Paul's subtle hints in that regard). We may need to revert that patch
> > before v4.5 ships, but I'd like to ponder it for a few days to and see
> > whether there is some way to batch them up so that they get reaped more
> > efficiently without requiring the dedicated thread.  
> 
> One thought would be to add an "emergency mode" to SRCU similar to that
> already in RCU.  Something to the effect that if the current list of
> callbacks is going to take more than a second to drain at the configured
> per-jiffy rate, just process them without waiting.
> 
> Would that help in this case, or am I missing something about the
> reproducer?
> 
> 							Thanx, Paul

I sent a patchset just a little while ago that should fix this in an
even better way, I think, without using call_srcu. In addition to the
problem that Eryu mentions, fsnotify_put_mark can cause a cascade of
other "put" routines. While I don't see any that obviously can block,
we probably don't want to that activity under local_bh_disable, so I
think that may be the best solution for fsnotify_marks.

That said, the "emergency mode" you describe might make srcu more
useful overall. What may make even more sense is to simply run all of
the callbacks without waiting when there is a synchronize_srcu (or
srcu_barrier) that is blocked and waiting on all of the callbacks to
complete.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] inotify_add_watch/inotify_rm_watch loops trigger oom
  2016-02-15  1:02   ` Paul E. McKenney
  2016-02-15  1:24     ` Jeff Layton
@ 2016-02-15  2:33     ` Linus Torvalds
  2016-02-15  6:01       ` Paul E. McKenney
  1 sibling, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2016-02-15  2:33 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Jeff Layton, Eryu Guan, linux-fsdevel, Jan Kara, Andrew Morton

On Sun, Feb 14, 2016 at 5:02 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> One thought would be to add an "emergency mode" to SRCU similar to that
> already in RCU.  Something to the effect that if the current list of
> callbacks is going to take more than a second to drain at the configured
> per-jiffy rate, just process them without waiting.
>
> Would that help in this case, or am I missing something about the
> reproducer?

Let's see if we can avoid srcu callbacks entirely as per Jeff, that
would be the best option.

                Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] inotify_add_watch/inotify_rm_watch loops trigger oom
  2016-02-15  2:33     ` Linus Torvalds
@ 2016-02-15  6:01       ` Paul E. McKenney
  0 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2016-02-15  6:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Layton, Eryu Guan, linux-fsdevel, Jan Kara, Andrew Morton

On Sun, Feb 14, 2016 at 06:33:25PM -0800, Linus Torvalds wrote:
> On Sun, Feb 14, 2016 at 5:02 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > One thought would be to add an "emergency mode" to SRCU similar to that
> > already in RCU.  Something to the effect that if the current list of
> > callbacks is going to take more than a second to drain at the configured
> > per-jiffy rate, just process them without waiting.
> >
> > Would that help in this case, or am I missing something about the
> > reproducer?
> 
> Let's see if we can avoid srcu callbacks entirely as per Jeff, that
> would be the best option.

Agreed, no point in adding complication without use cases.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-02-15  6:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-14  8:35 [BUG] inotify_add_watch/inotify_rm_watch loops trigger oom Eryu Guan
2016-02-14 14:39 ` Jeff Layton
2016-02-15  1:02   ` Paul E. McKenney
2016-02-15  1:24     ` Jeff Layton
2016-02-15  2:33     ` Linus Torvalds
2016-02-15  6:01       ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).