* [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2()
@ 2009-01-05 3:23 Li Zefan
2009-02-09 8:40 ` Andrew Morton
0 siblings, 1 reply; 26+ messages in thread
From: Li Zefan @ 2009-01-05 3:23 UTC (permalink / raw)
To: LKML; +Cc: Andrew Morton, Paul Menage, Al Viro
Thread 1:
for ((; ;))
{
mount -t cgroup -o cpuset xxx /mnt > /dev/null 2>&1
mkdir /mnt/0 > /dev/null 2>&1
rmdir /mnt/0 > /dev/null 2>&1
umount /mnt > /dev/null 2>&1
}
Thread 2:
for ((; ;))
{
mount -t cpuset xxx /mnt > /dev/null 2>&1
umount /mnt > /dev/null 2>&1
}
(Note: Again it is irrelevant which cgroup subsys is used.)
After a while this showed up:
------------[ cut here ]------------
WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2()
Hardware name: Aspire SA85
Modules linked in: bridge stp llc autofs4 dm_mirror dm_region_hash dm_log dm_mod r8169 parport_pc mii parport sg button sata_sis pata_sis ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
Pid: 4745, comm: umount Not tainted 2.6.28 #479
Call Trace:
[<c042bbe3>] warn_slowpath+0x79/0x8f
[<c044babf>] ? __lock_acquire+0x69a/0x700
[<c04ae44e>] ? mntput_no_expire+0x79/0xf2
[<c04ae481>] mntput_no_expire+0xac/0xf2
[<c04ae968>] sys_umount+0x26a/0x2b1
[<c04ae9c1>] sys_oldumount+0x12/0x14
[<c0403251>] sysenter_do_call+0x12/0x31
---[ end trace 79d0ab4bef01333f ]---
The WARNING is: WARN_ON(atomic_read(&mnt->__mnt_writers));
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-01-05 3:23 [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() Li Zefan @ 2009-02-09 8:40 ` Andrew Morton 2009-02-09 8:49 ` Li Zefan 2009-02-09 9:34 ` Al Viro 0 siblings, 2 replies; 26+ messages in thread From: Andrew Morton @ 2009-02-09 8:40 UTC (permalink / raw) To: Li Zefan; +Cc: LKML, Paul Menage, Al Viro, containers, Arjan van de Ven (cc's added) On Mon, 05 Jan 2009 11:23:33 +0800 Li Zefan <lizf@cn.fujitsu.com> wrote: > Thread 1: > for ((; ;)) > { > mount -t cgroup -o cpuset xxx /mnt > /dev/null 2>&1 > mkdir /mnt/0 > /dev/null 2>&1 > rmdir /mnt/0 > /dev/null 2>&1 > umount /mnt > /dev/null 2>&1 > } > > Thread 2: > for ((; ;)) > { > mount -t cpuset xxx /mnt > /dev/null 2>&1 > umount /mnt > /dev/null 2>&1 > } > > (Note: Again it is irrelevant which cgroup subsys is used.) > > After a while this showed up: > > ------------[ cut here ]------------ > WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() > Hardware name: Aspire SA85 > Modules linked in: bridge stp llc autofs4 dm_mirror dm_region_hash dm_log dm_mod r8169 parport_pc mii parport sg button sata_sis pata_sis ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan] > Pid: 4745, comm: umount Not tainted 2.6.28 #479 > Call Trace: > [<c042bbe3>] warn_slowpath+0x79/0x8f > [<c044babf>] ? __lock_acquire+0x69a/0x700 > [<c04ae44e>] ? mntput_no_expire+0x79/0xf2 > [<c04ae481>] mntput_no_expire+0xac/0xf2 > [<c04ae968>] sys_umount+0x26a/0x2b1 > [<c04ae9c1>] sys_oldumount+0x12/0x14 > [<c0403251>] sysenter_do_call+0x12/0x31 > ---[ end trace 79d0ab4bef01333f ]--- > > The WARNING is: WARN_ON(atomic_read(&mnt->__mnt_writers)); OK, I'm all confused. Here we see a WARN_ON triggered, but in http://lkml.org/lkml/2009/1/4/352 with the same testcase we're seeing a lockdep warning. You refer to Arjan's "lockdep: annotate sb ->s_umount" patch - but that's over two years old. And you say "The changelog said s_umount needs to be classified as per-sb, but actually it made it as per-filesystem." But what is the difference between per-sb and per-fs? More info here: http://bugzilla.kernel.org/show_bug.cgi?id=12673 This bug report seems to be all over the place. Is it a post-2.6.28 regression, btw? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-09 8:40 ` Andrew Morton @ 2009-02-09 8:49 ` Li Zefan 2009-02-09 11:03 ` Al Viro 2009-02-09 9:34 ` Al Viro 1 sibling, 1 reply; 26+ messages in thread From: Li Zefan @ 2009-02-09 8:49 UTC (permalink / raw) To: Andrew Morton; +Cc: LKML, Paul Menage, Al Viro, containers, Arjan van de Ven Andrew Morton wrote: > (cc's added) > > On Mon, 05 Jan 2009 11:23:33 +0800 Li Zefan <lizf@cn.fujitsu.com> wrote: > >> Thread 1: >> for ((; ;)) >> { >> mount -t cgroup -o cpuset xxx /mnt > /dev/null 2>&1 >> mkdir /mnt/0 > /dev/null 2>&1 >> rmdir /mnt/0 > /dev/null 2>&1 >> umount /mnt > /dev/null 2>&1 >> } >> >> Thread 2: >> for ((; ;)) >> { >> mount -t cpuset xxx /mnt > /dev/null 2>&1 >> umount /mnt > /dev/null 2>&1 >> } >> >> (Note: Again it is irrelevant which cgroup subsys is used.) >> >> After a while this showed up: >> >> ------------[ cut here ]------------ >> WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() >> Hardware name: Aspire SA85 >> Modules linked in: bridge stp llc autofs4 dm_mirror dm_region_hash dm_log dm_mod r8169 parport_pc mii parport sg button sata_sis pata_sis ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan] >> Pid: 4745, comm: umount Not tainted 2.6.28 #479 >> Call Trace: >> [<c042bbe3>] warn_slowpath+0x79/0x8f >> [<c044babf>] ? __lock_acquire+0x69a/0x700 >> [<c04ae44e>] ? mntput_no_expire+0x79/0xf2 >> [<c04ae481>] mntput_no_expire+0xac/0xf2 >> [<c04ae968>] sys_umount+0x26a/0x2b1 >> [<c04ae9c1>] sys_oldumount+0x12/0x14 >> [<c0403251>] sysenter_do_call+0x12/0x31 >> ---[ end trace 79d0ab4bef01333f ]--- >> >> The WARNING is: WARN_ON(atomic_read(&mnt->__mnt_writers)); > > OK, I'm all confused. Here we see a WARN_ON triggered, but in > http://lkml.org/lkml/2009/1/4/352 with the same testcase we're seeing a > lockdep warning. > They are 2 testcases with small difference ;) case 1: mount cat whichever control file umount case 2: mount mkdir /cgroup/0 rmdir /cgroup/0 umount > You refer to Arjan's "lockdep: annotate sb ->s_umount" patch - but > that's over two years old. > > And you say "The changelog said s_umount needs to be classified as > per-sb, but actually it made it as per-filesystem." But what is the > difference between per-sb and per-fs? > a filesystem can be single-sb or multile, isn't it? that's struct super_lock and struct file_system_type. I may be wrong here, since I don't know much about VFS... > More info here: http://bugzilla.kernel.org/show_bug.cgi?id=12673 > > This bug report seems to be all over the place. > > Is it a post-2.6.28 regression, btw? > I think it was introduced since cgroup was introduced. But it's hard to trigger in real-life, though it's easy using this test case. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-09 8:49 ` Li Zefan @ 2009-02-09 11:03 ` Al Viro 2009-02-09 11:58 ` Al Viro 0 siblings, 1 reply; 26+ messages in thread From: Al Viro @ 2009-02-09 11:03 UTC (permalink / raw) To: Li Zefan; +Cc: Andrew Morton, LKML, Paul Menage, containers, Arjan van de Ven BTW, a trivial note - kfree(root) in your ->kill_sb() is done earlier than it's nice to do. Shouldn't affect the problem, though. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-09 11:03 ` Al Viro @ 2009-02-09 11:58 ` Al Viro 2009-02-10 5:47 ` Li Zefan 0 siblings, 1 reply; 26+ messages in thread From: Al Viro @ 2009-02-09 11:58 UTC (permalink / raw) To: Li Zefan; +Cc: Andrew Morton, LKML, Paul Menage, containers, Arjan van de Ven On Mon, Feb 09, 2009 at 11:03:48AM +0000, Al Viro wrote: > BTW, a trivial note - kfree(root) in your ->kill_sb() is done > earlier than it's nice to do. Shouldn't affect the problem, though. Other probably irrelevant notes: memcpy(start, cgrp->dentry->d_name.name, len); cgrp = cgrp->parent; if (!cgrp) break; dentry = rcu_dereference(cgrp->dentry); in cgroup_path(). Why don't we need rcu_dereference on both? Moreover, shouldn't that be memcpy(start, dentry->d_name.name, len); anyway, seeing that we'd just looked at dentry->d_name.len? In cgroup_rmdir(): spin_lock(&cgrp->dentry->d_lock); d = dget(cgrp->dentry); spin_unlock(&d->d_lock); cgroup_d_remove_dir(d); dput(d); Er? Comments, please... Unless something very unusual is going on, either that d_lock is pointless or dget() is rather unsafe. cgroups_clone() /* Now do the VFS work to create a cgroup */ inode = parent->dentry->d_inode; /* Hold the parent directory mutex across this operation to * stop anyone else deleting the new cgroup */ mutex_lock(&inode->i_mutex); Can the parent be in process of getting deleted by somebody else? If yes, we are in trouble here. BTW, that thing in cgroup_path()... What guarantees that cgroup_rename() won't hit between getting len and doing memcpy()? That said, cgroup seems to be completely agnostic wrt anything happening on vfsmount level, so I really don't see how it gets to that WARN_ON(). Hell knows; I really want to see the sequence of events - it might be something like fscking up ->s_active handling with interesting results (cgroup code is certainly hitting it in not quite usual ways), it may be genuine VFS-only race. Need more data... ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-09 11:58 ` Al Viro @ 2009-02-10 5:47 ` Li Zefan 0 siblings, 0 replies; 26+ messages in thread From: Li Zefan @ 2009-02-10 5:47 UTC (permalink / raw) To: Al Viro; +Cc: Andrew Morton, LKML, Paul Menage, containers, Arjan van de Ven Al Viro wrote: > On Mon, Feb 09, 2009 at 11:03:48AM +0000, Al Viro wrote: >> BTW, a trivial note - kfree(root) in your ->kill_sb() is done >> earlier than it's nice to do. Shouldn't affect the problem, though. > Do you mean kfree(root) should be called after kill_litter_super()? I don't see the point here.. > Other probably irrelevant notes: > > memcpy(start, cgrp->dentry->d_name.name, len); > cgrp = cgrp->parent; > if (!cgrp) > break; > dentry = rcu_dereference(cgrp->dentry); > > in cgroup_path(). Why don't we need rcu_dereference on both? > Moreover, shouldn't that be > memcpy(start, dentry->d_name.name, len); > anyway, seeing that we'd just looked at dentry->d_name.len? We are right, dentry-> but not cgrp->dentry-> should be used. > > In cgroup_rmdir(): > spin_lock(&cgrp->dentry->d_lock); > d = dget(cgrp->dentry); > spin_unlock(&d->d_lock); > > cgroup_d_remove_dir(d); > dput(d); > Er? Comments, please... Unless something very unusual is going on, > either that d_lock is pointless or dget() is rather unsafe. > The code was inherited from cpuset. I doubted it's redundant, but I was not confident enough to remove it. > cgroups_clone() > /* Now do the VFS work to create a cgroup */ > inode = parent->dentry->d_inode; > > /* Hold the parent directory mutex across this operation to > * stop anyone else deleting the new cgroup */ > mutex_lock(&inode->i_mutex); > Can the parent be in process of getting deleted by somebody else? If yes, > we are in trouble here. > > BTW, that thing in cgroup_path()... What guarantees that cgroup_rename() > won't hit between getting len and doing memcpy()? > cgroup_path() was inherited from cpuset's cpuset_path(), and I think it's true it races with rename. > That said, cgroup seems to be completely agnostic wrt anything happening > on vfsmount level, so I really don't see how it gets to that WARN_ON(). > Hell knows; I really want to see the sequence of events - it might be > something like fscking up ->s_active handling with interesting results > (cgroup code is certainly hitting it in not quite usual ways), it may be > genuine VFS-only race. Need more data... > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-09 8:40 ` Andrew Morton 2009-02-09 8:49 ` Li Zefan @ 2009-02-09 9:34 ` Al Viro 2009-02-09 11:30 ` Li Zefan 2009-02-09 17:48 ` Dave Hansen 1 sibling, 2 replies; 26+ messages in thread From: Al Viro @ 2009-02-09 9:34 UTC (permalink / raw) To: Andrew Morton; +Cc: Li Zefan, LKML, Paul Menage, containers, Arjan van de Ven On Mon, Feb 09, 2009 at 12:40:46AM -0800, Andrew Morton wrote: > > Thread 1: > > for ((; ;)) > > { > > mount -t cgroup -o cpuset xxx /mnt > /dev/null 2>&1 > > mkdir /mnt/0 > /dev/null 2>&1 > > rmdir /mnt/0 > /dev/null 2>&1 > > umount /mnt > /dev/null 2>&1 > > } > > > > Thread 2: > > { > > mount -t cpuset xxx /mnt > /dev/null 2>&1 > > umount /mnt > /dev/null 2>&1 > > } How cute... Same mountpoint in both, so these mount(2) will sometimes fail (cgroup picks the same sb on the same options, AFAICS) and fail silently due to these redirects... That's a lovely way to stress-test a large part of ro-bind stuff *and* umount()-related code. Could you do C equivalent of the above (just the same syscalls in loop, nothing fancier) and do time-stamped strace? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-09 9:34 ` Al Viro @ 2009-02-09 11:30 ` Li Zefan 2009-02-12 6:10 ` Li Zefan 2009-02-09 17:48 ` Dave Hansen 1 sibling, 1 reply; 26+ messages in thread From: Li Zefan @ 2009-02-09 11:30 UTC (permalink / raw) To: Al Viro; +Cc: Andrew Morton, LKML, Paul Menage, containers, Arjan van de Ven Al Viro wrote: > On Mon, Feb 09, 2009 at 12:40:46AM -0800, Andrew Morton wrote: >>> Thread 1: >>> for ((; ;)) >>> { >>> mount -t cgroup -o cpuset xxx /mnt > /dev/null 2>&1 >>> mkdir /mnt/0 > /dev/null 2>&1 >>> rmdir /mnt/0 > /dev/null 2>&1 >>> umount /mnt > /dev/null 2>&1 >>> } >>> >>> Thread 2: >>> { >>> mount -t cpuset xxx /mnt > /dev/null 2>&1 >>> umount /mnt > /dev/null 2>&1 >>> } > > How cute... Same mountpoint in both, so these mount(2) will sometimes > fail (cgroup picks the same sb on the same options, AFAICS) and fail > silently due to these redirects... > > That's a lovely way to stress-test a large part of ro-bind stuff *and* > umount()-related code. Could you do C equivalent of the above (just > the same syscalls in loop, nothing fancier) and do time-stamped strace? > Sure, I'll write a C version and try to reproduce the warning. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-09 11:30 ` Li Zefan @ 2009-02-12 6:10 ` Li Zefan 2009-02-12 6:24 ` Al Viro 0 siblings, 1 reply; 26+ messages in thread From: Li Zefan @ 2009-02-12 6:10 UTC (permalink / raw) To: Al Viro; +Cc: containers, Paul Menage, Andrew Morton, LKML, Arjan van de Ven Li Zefan wrote: > Al Viro wrote: >> On Mon, Feb 09, 2009 at 12:40:46AM -0800, Andrew Morton wrote: >>>> Thread 1: >>>> for ((; ;)) >>>> { >>>> mount -t cgroup -o cpuset xxx /mnt > /dev/null 2>&1 >>>> mkdir /mnt/0 > /dev/null 2>&1 >>>> rmdir /mnt/0 > /dev/null 2>&1 >>>> umount /mnt > /dev/null 2>&1 >>>> } >>>> >>>> Thread 2: >>>> { >>>> mount -t cpuset xxx /mnt > /dev/null 2>&1 >>>> umount /mnt > /dev/null 2>&1 >>>> } >> How cute... Same mountpoint in both, so these mount(2) will sometimes >> fail (cgroup picks the same sb on the same options, AFAICS) and fail >> silently due to these redirects... >> >> That's a lovely way to stress-test a large part of ro-bind stuff *and* >> umount()-related code. Could you do C equivalent of the above (just >> the same syscalls in loop, nothing fancier) and do time-stamped strace? >> > > Sure, I'll write a C version and try to reproduce the warning. > Unfortunately, the C equivalent can't reproduce the warning, I've run the test for the whole night. :( While using the script, often I can trigger the warning in several mins. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-12 6:10 ` Li Zefan @ 2009-02-12 6:24 ` Al Viro 2009-02-12 6:33 ` Li Zefan 0 siblings, 1 reply; 26+ messages in thread From: Al Viro @ 2009-02-12 6:24 UTC (permalink / raw) To: Li Zefan; +Cc: containers, Paul Menage, Andrew Morton, LKML, Arjan van de Ven On Thu, Feb 12, 2009 at 02:10:37PM +0800, Li Zefan wrote: > Li Zefan wrote: > > Al Viro wrote: > >> On Mon, Feb 09, 2009 at 12:40:46AM -0800, Andrew Morton wrote: > >>>> Thread 1: > >>>> for ((; ;)) > >>>> { > >>>> mount -t cgroup -o cpuset xxx /mnt > /dev/null 2>&1 > >>>> mkdir /mnt/0 > /dev/null 2>&1 > >>>> rmdir /mnt/0 > /dev/null 2>&1 > >>>> umount /mnt > /dev/null 2>&1 > >>>> } > >>>> > >>>> Thread 2: > >>>> { > >>>> mount -t cpuset xxx /mnt > /dev/null 2>&1 > >>>> umount /mnt > /dev/null 2>&1 > >>>> } > >> How cute... Same mountpoint in both, so these mount(2) will sometimes > >> fail (cgroup picks the same sb on the same options, AFAICS) and fail > >> silently due to these redirects... > >> > >> That's a lovely way to stress-test a large part of ro-bind stuff *and* > >> umount()-related code. Could you do C equivalent of the above (just > >> the same syscalls in loop, nothing fancier) and do time-stamped strace? > >> > > > > Sure, I'll write a C version and try to reproduce the warning. > > > > Unfortunately, the C equivalent can't reproduce the warning, I've run the > test for the whole night. :( While using the script, often I can trigger > the warning in several mins. Ho-hum... I wonder if we are hitting cgroup_clone() in all that fun... Could you a) add a printk to that sucker b) independently from (a), see if wrapping these syscalls into pid = fork(); if (!pid) { [make a syscall, print something] exit(0); } else if (pid > 0) { waitpid(pid, NULL, 0); } and see what happens... ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-12 6:24 ` Al Viro @ 2009-02-12 6:33 ` Li Zefan 2009-02-12 6:54 ` Li Zefan 0 siblings, 1 reply; 26+ messages in thread From: Li Zefan @ 2009-02-12 6:33 UTC (permalink / raw) To: Al Viro; +Cc: containers, Paul Menage, Andrew Morton, LKML, Arjan van de Ven >>>> How cute... Same mountpoint in both, so these mount(2) will sometimes >>>> fail (cgroup picks the same sb on the same options, AFAICS) and fail >>>> silently due to these redirects... >>>> >>>> That's a lovely way to stress-test a large part of ro-bind stuff *and* >>>> umount()-related code. Could you do C equivalent of the above (just >>>> the same syscalls in loop, nothing fancier) and do time-stamped strace? >>>> >>> Sure, I'll write a C version and try to reproduce the warning. >>> >> Unfortunately, the C equivalent can't reproduce the warning, I've run the >> test for the whole night. :( While using the script, often I can trigger >> the warning in several mins. > > Ho-hum... I wonder if we are hitting cgroup_clone() in all that fun... I don't think so, I think cgroup_clone() will be called only if namespace is used, like clone(CLONE_NEWNS). Even if cgroup_clone() gets called, it will return before doing any vfs work unless the ns_cgroup subsystem is mounted. int cgroup_clone(struct task_struct *tsk, struct cgroup_subsys *subsys, char *nodename) { ... mutex_lock(&cgroup_mutex); again: root = subsys->root; if (root == &rootnode) { <--- here mutex_unlock(&cgroup_mutex); return 0; } > Could you > a) add a printk to that sucker > b) independently from (a), see if wrapping these syscalls into > pid = fork(); > if (!pid) { > [make a syscall, print something] > exit(0); > } else if (pid > 0) { > waitpid(pid, NULL, 0); > } > and see what happens... > > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-12 6:33 ` Li Zefan @ 2009-02-12 6:54 ` Li Zefan 2009-02-12 7:07 ` Al Viro 0 siblings, 1 reply; 26+ messages in thread From: Li Zefan @ 2009-02-12 6:54 UTC (permalink / raw) To: Al Viro; +Cc: containers, Paul Menage, Arjan van de Ven, Andrew Morton, LKML Li Zefan wrote: >>>>> How cute... Same mountpoint in both, so these mount(2) will sometimes >>>>> fail (cgroup picks the same sb on the same options, AFAICS) and fail >>>>> silently due to these redirects... >>>>> >>>>> That's a lovely way to stress-test a large part of ro-bind stuff *and* >>>>> umount()-related code. Could you do C equivalent of the above (just >>>>> the same syscalls in loop, nothing fancier) and do time-stamped strace? >>>>> >>>> Sure, I'll write a C version and try to reproduce the warning. >>>> >>> Unfortunately, the C equivalent can't reproduce the warning, I've run the >>> test for the whole night. :( While using the script, often I can trigger >>> the warning in several mins. >> Ho-hum... I wonder if we are hitting cgroup_clone() in all that fun... > > I don't think so, I think cgroup_clone() will be called only if namespace is > used, like clone(CLONE_NEWNS). Even if cgroup_clone() gets called, it will > return before doing any vfs work unless the ns_cgroup subsystem is mounted. > But the following testcase can also trigger the warning: thread 1: for ((; ;)) { mount -t cgroup -o ns xxx cgroup/ > /dev/null 2>&1 # remove the dirs generated by cgroup_clone() rmdir cgroup/[1-9]* > /dev/null 2>&1 umount cgroup/ > /dev/null 2>&1 } thread 2: int foo(void *arg) { return 0; } char *stack[4096]; int main(int argc, char **argv) { int usec = DEFAULT_USEC; while (1) { usleep(usec); # cgroup_clone() will be called clone(foo, stack+4096, CLONE_NEWNS, NULL); } return 0; } ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-12 6:54 ` Li Zefan @ 2009-02-12 7:07 ` Al Viro 2009-02-13 5:09 ` Li Zefan 0 siblings, 1 reply; 26+ messages in thread From: Al Viro @ 2009-02-12 7:07 UTC (permalink / raw) To: Li Zefan; +Cc: containers, Paul Menage, Arjan van de Ven, Andrew Morton, LKML On Thu, Feb 12, 2009 at 02:54:58PM +0800, Li Zefan wrote: > But the following testcase can also trigger the warning: > > thread 1: > for ((; ;)) > { > mount -t cgroup -o ns xxx cgroup/ > /dev/null 2>&1 > # remove the dirs generated by cgroup_clone() > rmdir cgroup/[1-9]* > /dev/null 2>&1 > umount cgroup/ > /dev/null 2>&1 > } > > > thread 2: > > int foo(void *arg) > { return 0; } > > char *stack[4096]; > > int main(int argc, char **argv) > { > int usec = DEFAULT_USEC; > while (1) { > usleep(usec); > # cgroup_clone() will be called > clone(foo, stack+4096, CLONE_NEWNS, NULL); > } > > return 0; > } Uh-oh... That clone() will do more, actually - it will clone a bunch of vfsmounts. What happens if you create a separate namespace for the first thread, so that the second one would not have our vfsmount to play with? Alternatively, what if the second thread is doing mount --bind cgroup foo umount foo in a loop? Another one: does turning the umount in the first thread into umount -l affect anything? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-12 7:07 ` Al Viro @ 2009-02-13 5:09 ` Li Zefan 2009-02-13 5:47 ` Al Viro 0 siblings, 1 reply; 26+ messages in thread From: Li Zefan @ 2009-02-13 5:09 UTC (permalink / raw) To: Al Viro; +Cc: containers, Paul Menage, Arjan van de Ven, Andrew Morton, LKML >> thread 1: >> for ((; ;)) >> { >> mount -t cgroup -o ns xxx cgroup/ > /dev/null 2>&1 >> # remove the dirs generated by cgroup_clone() >> rmdir cgroup/[1-9]* > /dev/null 2>&1 >> umount cgroup/ > /dev/null 2>&1 >> } >> >> >> thread 2: >> >> int foo(void *arg) >> { return 0; } >> >> char *stack[4096]; >> >> int main(int argc, char **argv) >> { >> int usec = DEFAULT_USEC; >> while (1) { >> usleep(usec); >> # cgroup_clone() will be called >> clone(foo, stack+4096, CLONE_NEWNS, NULL); >> } >> >> return 0; >> } > > Uh-oh... That clone() will do more, actually - it will clone a bunch > of vfsmounts. What happens if you create a separate namespace for the > first thread, so that the second one would not have our vfsmount to > play with? > The warning still can be triggered, but seems harder (cost me 1 hour) > Alternatively, what if the second thread is doing > mount --bind cgroup foo > umount foo > in a loop? > I ran following testcase, and triggered the warning in 1 hour: thread 1: for ((; ;)) { mount --bind /cgroup /mnt > /dev/null 2>&1 umount /mnt > /dev/null 2>&1 } tread 2: for ((; ;)) { mount -t cgroup -o cpu xxx /cgroup > /dev/null 2>&1 mkdir /cgroup/0 > /dev/null 2>&1 rmdir /cgroup/0 > /dev/null 2>&1 umount -l /cgroup > /dev/null 2>&1 } > Another one: does turning the umount in the first thread into umount -l > affect anything? > For this one, I ran the test for the whole night, but failed to hit the warning. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-13 5:09 ` Li Zefan @ 2009-02-13 5:47 ` Al Viro 2009-02-13 6:12 ` Li Zefan 0 siblings, 1 reply; 26+ messages in thread From: Al Viro @ 2009-02-13 5:47 UTC (permalink / raw) To: Li Zefan; +Cc: containers, Paul Menage, Arjan van de Ven, Andrew Morton, LKML On Fri, Feb 13, 2009 at 01:09:17PM +0800, Li Zefan wrote: > I ran following testcase, and triggered the warning in 1 hour: > > thread 1: > for ((; ;)) > { > mount --bind /cgroup /mnt > /dev/null 2>&1 > umount /mnt > /dev/null 2>&1 > } > > tread 2: > for ((; ;)) > { > mount -t cgroup -o cpu xxx /cgroup > /dev/null 2>&1 > mkdir /cgroup/0 > /dev/null 2>&1 > rmdir /cgroup/0 > /dev/null 2>&1 > umount -l /cgroup > /dev/null 2>&1 > } Wow. You know, at that point these redirects could probably be removed. If anything in there ends up producing an output, we very much want to see that. Actually, I'd even make that mount --bind /cgroup/mnt || (echo mount1: ; date) etc., so we'd see when do they fail and which one fails (if any)... Which umount has failed in the above, BTW? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-13 5:47 ` Al Viro @ 2009-02-13 6:12 ` Li Zefan 2009-02-13 6:31 ` Li Zefan 2009-02-13 6:41 ` Al Viro 0 siblings, 2 replies; 26+ messages in thread From: Li Zefan @ 2009-02-13 6:12 UTC (permalink / raw) To: Al Viro; +Cc: containers, Paul Menage, Arjan van de Ven, Andrew Morton, LKML Al Viro wrote: > On Fri, Feb 13, 2009 at 01:09:17PM +0800, Li Zefan wrote: > >> I ran following testcase, and triggered the warning in 1 hour: >> >> thread 1: >> for ((; ;)) >> { >> mount --bind /cgroup /mnt > /dev/null 2>&1 >> umount /mnt > /dev/null 2>&1 >> } >> >> tread 2: >> for ((; ;)) >> { >> mount -t cgroup -o cpu xxx /cgroup > /dev/null 2>&1 >> mkdir /cgroup/0 > /dev/null 2>&1 >> rmdir /cgroup/0 > /dev/null 2>&1 >> umount -l /cgroup > /dev/null 2>&1 >> } > > Wow. You know, at that point these redirects could probably be removed. Ah, yes. > If anything in there ends up producing an output, we very much want to > see that. Actually, I'd even make that > mount --bind /cgroup/mnt || (echo mount1: ; date) > etc., so we'd see when do they fail and which one fails (if any)... > > Which umount has failed in the above, BTW? > > the first one sometimes failed, and the second one hasn't failed: mount: wrong fs type, bad option, bad superblock on /cgroup, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so mount1 Fri Feb 13 14:05:37 CST 2009 umount: /mnt: not mounted mount: wrong fs type, bad option, bad superblock on /cgroup, ... mount1 Fri Feb 13 14:08:34 CST 2009 umount: /mnt: not mounted mount: wrong fs type, bad option, bad superblock on /cgroup, ... mount1 Fri Feb 13 14:08:43 CST 2009 umount: /mnt: not mounted mount: wrong fs type, bad option, bad superblock on /cgroup, ... ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-13 6:12 ` Li Zefan @ 2009-02-13 6:31 ` Li Zefan 2009-02-13 6:41 ` Al Viro 1 sibling, 0 replies; 26+ messages in thread From: Li Zefan @ 2009-02-13 6:31 UTC (permalink / raw) To: Al Viro; +Cc: containers, LKML, Paul Menage, Andrew Morton, Arjan van de Ven Li Zefan wrote: > Al Viro wrote: >> On Fri, Feb 13, 2009 at 01:09:17PM +0800, Li Zefan wrote: >> >>> I ran following testcase, and triggered the warning in 1 hour: >>> >>> thread 1: >>> for ((; ;)) >>> { >>> mount --bind /cgroup /mnt > /dev/null 2>&1 >>> umount /mnt > /dev/null 2>&1 >>> } >>> >>> tread 2: >>> for ((; ;)) >>> { >>> mount -t cgroup -o cpu xxx /cgroup > /dev/null 2>&1 >>> mkdir /cgroup/0 > /dev/null 2>&1 >>> rmdir /cgroup/0 > /dev/null 2>&1 >>> umount -l /cgroup > /dev/null 2>&1 >>> } >> Wow. You know, at that point these redirects could probably be removed. > > Ah, yes. > >> If anything in there ends up producing an output, we very much want to >> see that. Actually, I'd even make that >> mount --bind /cgroup/mnt || (echo mount1: ; date) >> etc., so we'd see when do they fail and which one fails (if any)... >> >> Which umount has failed in the above, BTW? >> >> > > the first one sometimes failed, and the second one hasn't failed: > Just triggered the warning at about: Fri Feb 13 14:26:03 CST 2009 But both 2 threads were not failing at that time: mount: wrong fs type, bad option, bad superblock on /cgroup, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so mount1 Fri Feb 13 14:25:21 CST 2009 umount: /mnt: not mounted mount: wrong fs type, bad option, bad superblock on /cgroup, ... mount1 Fri Feb 13 14:26:32 CST 2009 umount: /mnt: not mounted mount: wrong fs type, bad option, bad superblock on /cgroup, ... ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-13 6:12 ` Li Zefan 2009-02-13 6:31 ` Li Zefan @ 2009-02-13 6:41 ` Al Viro 2009-02-13 7:18 ` Al Viro 1 sibling, 1 reply; 26+ messages in thread From: Al Viro @ 2009-02-13 6:41 UTC (permalink / raw) To: Li Zefan; +Cc: containers, Paul Menage, Arjan van de Ven, Andrew Morton, LKML On Fri, Feb 13, 2009 at 02:12:13PM +0800, Li Zefan wrote: > Al Viro wrote: > > On Fri, Feb 13, 2009 at 01:09:17PM +0800, Li Zefan wrote: > > > >> I ran following testcase, and triggered the warning in 1 hour: > >> > >> thread 1: > >> for ((; ;)) > >> { > >> mount --bind /cgroup /mnt > /dev/null 2>&1 > >> umount /mnt > /dev/null 2>&1 > >> } > >> > >> tread 2: > >> for ((; ;)) > >> { > >> mount -t cgroup -o cpu xxx /cgroup > /dev/null 2>&1 > >> mkdir /cgroup/0 > /dev/null 2>&1 > >> rmdir /cgroup/0 > /dev/null 2>&1 > >> umount -l /cgroup > /dev/null 2>&1 > >> } > > > > Wow. You know, at that point these redirects could probably be removed. > > Ah, yes. > > > If anything in there ends up producing an output, we very much want to > > see that. Actually, I'd even make that > > mount --bind /cgroup/mnt || (echo mount1: ; date) > > etc., so we'd see when do they fail and which one fails (if any)... > > > > Which umount has failed in the above, BTW? > > > > > > the first one sometimes failed, and the second one hasn't failed: > mount: wrong fs type, bad option, bad superblock on /cgroup, > missing codepage or helper program, or other error > In some cases useful info is found in syslog - try > dmesg | tail or so > > mount1 Hold on. In your last example the first one was doing mount --bind; has _that_ failed? Oh, wait... It can fail, all right, if lookup on /cgroup gives you your filesystem with the second thread managing to detach it before we get the namespace_sem. Then we'll fail that way - and clean up properly. Oh, well... The original question still stands: with those two scripts, which umount produces that WARN_ON? The trivial way to check would be to have a copy of /sbin/umount under a different name and use _that_ in one of the threads instead of umount. Then reproduce the WARN_ON and look at the process name in dmesg... ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-13 6:41 ` Al Viro @ 2009-02-13 7:18 ` Al Viro 2009-02-13 7:26 ` Li Zefan 0 siblings, 1 reply; 26+ messages in thread From: Al Viro @ 2009-02-13 7:18 UTC (permalink / raw) To: Li Zefan; +Cc: containers, Paul Menage, Arjan van de Ven, Andrew Morton, LKML On Fri, Feb 13, 2009 at 06:41:35AM +0000, Al Viro wrote: Aaaargh... /* * We don't have to hold all of the locks at the * same time here because we know that we're the * last reference to mnt and that no new writers * can come in. */ for_each_possible_cpu(cpu) { struct mnt_writer *cpu_writer = &per_cpu(mnt_writers, cpu); if (cpu_writer->mnt != mnt) continue; spin_lock(&cpu_writer->lock); is *almost* OK. Modulo SMP cache coherency. We know that nothing should be setting ->mnt to ours anymore, that's fine. But we do not know if we'd seen *earlier* change done on CPU in question (not the one we are running __mntput() on). I probably would still like to use milder solution in the long run, but for now let's check if turning that into struct mnt_writer *cpu_writer = &per_cpu(mnt_writers, cpu); spin_lock(&cpu_writer->lock); if (cpu_writer->mnt != mnt) { spin_unlock(&cpu_writer->lock); continue; } prevents the problem, OK? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-13 7:18 ` Al Viro @ 2009-02-13 7:26 ` Li Zefan 2009-02-16 1:29 ` Li Zefan 0 siblings, 1 reply; 26+ messages in thread From: Li Zefan @ 2009-02-13 7:26 UTC (permalink / raw) To: Al Viro; +Cc: containers, Paul Menage, Arjan van de Ven, Andrew Morton, LKML Al Viro wrote: > On Fri, Feb 13, 2009 at 06:41:35AM +0000, Al Viro wrote: > > Aaaargh... > > /* > * We don't have to hold all of the locks at the > * same time here because we know that we're the > * last reference to mnt and that no new writers > * can come in. > */ > for_each_possible_cpu(cpu) { > struct mnt_writer *cpu_writer = &per_cpu(mnt_writers, cpu); > if (cpu_writer->mnt != mnt) > continue; > spin_lock(&cpu_writer->lock); > > is *almost* OK. Modulo SMP cache coherency. We know that nothing should > be setting ->mnt to ours anymore, that's fine. But we do not know if > we'd seen *earlier* change done on CPU in question (not the one we > are running __mntput() on). > > I probably would still like to use milder solution in the long run, but for > now let's check if turning that into > > struct mnt_writer *cpu_writer = &per_cpu(mnt_writers, cpu); > spin_lock(&cpu_writer->lock); > if (cpu_writer->mnt != mnt) { > spin_unlock(&cpu_writer->lock); > continue; > } > prevents the problem, OK? > Sure, I'll try. :) BTW, thread2's rmdir failed: rmdir: /cgroup/0: No such file or directory ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-13 7:26 ` Li Zefan @ 2009-02-16 1:29 ` Li Zefan 2009-02-16 2:38 ` Al Viro 0 siblings, 1 reply; 26+ messages in thread From: Li Zefan @ 2009-02-16 1:29 UTC (permalink / raw) To: Al Viro; +Cc: containers, LKML, Paul Menage, Andrew Morton, Arjan van de Ven Li Zefan wrote: > Al Viro wrote: >> On Fri, Feb 13, 2009 at 06:41:35AM +0000, Al Viro wrote: >> >> Aaaargh... >> >> /* >> * We don't have to hold all of the locks at the >> * same time here because we know that we're the >> * last reference to mnt and that no new writers >> * can come in. >> */ >> for_each_possible_cpu(cpu) { >> struct mnt_writer *cpu_writer = &per_cpu(mnt_writers, cpu); >> if (cpu_writer->mnt != mnt) >> continue; >> spin_lock(&cpu_writer->lock); >> >> is *almost* OK. Modulo SMP cache coherency. We know that nothing should >> be setting ->mnt to ours anymore, that's fine. But we do not know if >> we'd seen *earlier* change done on CPU in question (not the one we >> are running __mntput() on). >> >> I probably would still like to use milder solution in the long run, but for >> now let's check if turning that into >> >> struct mnt_writer *cpu_writer = &per_cpu(mnt_writers, cpu); >> spin_lock(&cpu_writer->lock); >> if (cpu_writer->mnt != mnt) { >> spin_unlock(&cpu_writer->lock); >> continue; >> } >> prevents the problem, OK? >> > > Sure, I'll try. :) > Not a single warning for the whole weekend, so I think above change works. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-16 1:29 ` Li Zefan @ 2009-02-16 2:38 ` Al Viro 2009-02-16 2:47 ` Li Zefan 0 siblings, 1 reply; 26+ messages in thread From: Al Viro @ 2009-02-16 2:38 UTC (permalink / raw) To: Li Zefan Cc: containers, LKML, Paul Menage, Andrew Morton, Arjan van de Ven, gregkh, Linus Torvalds On Mon, Feb 16, 2009 at 09:29:00AM +0800, Li Zefan wrote: > >> struct mnt_writer *cpu_writer = &per_cpu(mnt_writers, cpu); > >> spin_lock(&cpu_writer->lock); > >> if (cpu_writer->mnt != mnt) { > >> spin_unlock(&cpu_writer->lock); > >> continue; > >> } > >> prevents the problem, OK? > >> > > > > Sure, I'll try. :) > > > > Not a single warning for the whole weekend, so I think above change works. OK... So here's what we really want: * we know that nobody will set cpu_writer->mnt to mnt from now on * all changes to that sucker are done under cpu_writer->lock * we want the laziest equivalent of spin_lock(&cpu_writer->lock); if (likely(cpu_writer->mnt != mnt)) { spin_unlock(&cpu_writer->lock); continue; } /* do stuff */ that would make sure we won't miss earlier setting of ->mnt done by another CPU. Anyway, for now (HEAD and all -stable starting with 2.6.26) we want this: --- fs/namespace.c 2009-01-25 21:45:31.000000000 -0500 +++ fs/namespace.c 2009-02-15 21:31:14.000000000 -0500 @@ -614,9 +614,11 @@ */ for_each_possible_cpu(cpu) { struct mnt_writer *cpu_writer = &per_cpu(mnt_writers, cpu); - if (cpu_writer->mnt != mnt) - continue; spin_lock(&cpu_writer->lock); + if (cpu_writer->mnt != mnt) { + spin_unlock(&cpu_writer->lock); + continue; + } atomic_add(cpu_writer->count, &mnt->__mnt_writers); cpu_writer->count = 0; /* ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-16 2:38 ` Al Viro @ 2009-02-16 2:47 ` Li Zefan 2009-02-16 2:57 ` Al Viro 0 siblings, 1 reply; 26+ messages in thread From: Li Zefan @ 2009-02-16 2:47 UTC (permalink / raw) To: Al Viro Cc: containers, LKML, Paul Menage, Andrew Morton, Arjan van de Ven, gregkh, Linus Torvalds > OK... So here's what we really want: > * we know that nobody will set cpu_writer->mnt to mnt from now on > * all changes to that sucker are done under cpu_writer->lock > * we want the laziest equivalent of > spin_lock(&cpu_writer->lock); > if (likely(cpu_writer->mnt != mnt)) { > spin_unlock(&cpu_writer->lock); > continue; > } > /* do stuff */ > that would make sure we won't miss earlier setting of ->mnt done by another > CPU. > If this is done, I'll be available to test it. > Anyway, for now (HEAD and all -stable starting with 2.6.26) we want this: > And here is my: Tested-by: Li Zefan <lizf@cn.fujitsu.com> > --- fs/namespace.c 2009-01-25 21:45:31.000000000 -0500 > +++ fs/namespace.c 2009-02-15 21:31:14.000000000 -0500 > @@ -614,9 +614,11 @@ > */ > for_each_possible_cpu(cpu) { > struct mnt_writer *cpu_writer = &per_cpu(mnt_writers, cpu); > - if (cpu_writer->mnt != mnt) > - continue; > spin_lock(&cpu_writer->lock); > + if (cpu_writer->mnt != mnt) { > + spin_unlock(&cpu_writer->lock); > + continue; > + } > atomic_add(cpu_writer->count, &mnt->__mnt_writers); > cpu_writer->count = 0; > /* ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-16 2:47 ` Li Zefan @ 2009-02-16 2:57 ` Al Viro 0 siblings, 0 replies; 26+ messages in thread From: Al Viro @ 2009-02-16 2:57 UTC (permalink / raw) To: Li Zefan Cc: containers, LKML, Paul Menage, Andrew Morton, Arjan van de Ven, gregkh, Linus Torvalds On Mon, Feb 16, 2009 at 10:47:02AM +0800, Li Zefan wrote: > And here is my: > > Tested-by: Li Zefan <lizf@cn.fujitsu.com> Right... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-09 9:34 ` Al Viro 2009-02-09 11:30 ` Li Zefan @ 2009-02-09 17:48 ` Dave Hansen 2009-02-09 18:11 ` Arjan van de Ven 1 sibling, 1 reply; 26+ messages in thread From: Dave Hansen @ 2009-02-09 17:48 UTC (permalink / raw) To: Al Viro; +Cc: Andrew Morton, containers, Paul Menage, LKML, Arjan van de Ven On Mon, 2009-02-09 at 09:34 +0000, Al Viro wrote: > On Mon, Feb 09, 2009 at 12:40:46AM -0800, Andrew Morton wrote: > > > Thread 1: > > > for ((; ;)) > > > { > > > mount -t cgroup -o cpuset xxx /mnt > /dev/null 2>&1 > > > mkdir /mnt/0 > /dev/null 2>&1 > > > rmdir /mnt/0 > /dev/null 2>&1 > > > umount /mnt > /dev/null 2>&1 > > > } > > > > > > Thread 2: > > > { > > > mount -t cpuset xxx /mnt > /dev/null 2>&1 > > > umount /mnt > /dev/null 2>&1 > > > } > > How cute... Same mountpoint in both, so these mount(2) will sometimes > fail (cgroup picks the same sb on the same options, AFAICS) and fail > silently due to these redirects... > > That's a lovely way to stress-test a large part of ro-bind stuff *and* > umount()-related code. Could you do C equivalent of the above (just > the same syscalls in loop, nothing fancier) and do time-stamped > strace? Could you also add a printk of what ->__mnt_writers was at the time of the WARN_ON()? That will hopefully at least tell us whether we're looking at a real leak or just a single missed mnt_want/drop_write(). Also hopefully in which direction the thing is biased. With the mount not being around long I'm not horribly hopeful, but it can't hurt. -- Dave ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() 2009-02-09 17:48 ` Dave Hansen @ 2009-02-09 18:11 ` Arjan van de Ven 0 siblings, 0 replies; 26+ messages in thread From: Arjan van de Ven @ 2009-02-09 18:11 UTC (permalink / raw) To: Dave Hansen; +Cc: Al Viro, Andrew Morton, containers, Paul Menage, LKML On Mon, 09 Feb 2009 09:48:59 -0800 Dave Hansen <dave@linux.vnet.ibm.com> wrote: > On Mon, 2009-02-09 at 09:34 +0000, Al Viro wrote: > > On Mon, Feb 09, 2009 at 12:40:46AM -0800, Andrew Morton wrote: > > > > Thread 1: > > > > for ((; ;)) > > > > { > > > > mount -t cgroup -o cpuset xxx /mnt > /dev/null 2>&1 > > > > mkdir /mnt/0 > /dev/null 2>&1 > > > > rmdir /mnt/0 > /dev/null 2>&1 > > > > umount /mnt > /dev/null 2>&1 > > > > } > > > > > > > > Thread 2: > > > > { > > > > mount -t cpuset xxx /mnt > /dev/null 2>&1 > > > > umount /mnt > /dev/null 2>&1 > > > > } > > > > How cute... Same mountpoint in both, so these mount(2) will > > sometimes fail (cgroup picks the same sb on the same options, > > AFAICS) and fail silently due to these redirects... > > > > That's a lovely way to stress-test a large part of ro-bind stuff > > *and* umount()-related code. Could you do C equivalent of the > > above (just the same syscalls in loop, nothing fancier) and do > > time-stamped strace? > > Could you also add a printk of what ->__mnt_writers was at the time of > the WARN_ON()? That will hopefully at least tell us whether we're > looking at a real leak or just a single missed mnt_want/drop_write(). > Also hopefully in which direction the thing is biased. With the mount > not being around long I'm not horribly hopeful, but it can't hurt. ... we could just change the WARN_ON to a WARN().. which has printk semantics ;) -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2009-02-16 2:57 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-05 3:23 [cgroup or VFS ?] WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() Li Zefan 2009-02-09 8:40 ` Andrew Morton 2009-02-09 8:49 ` Li Zefan 2009-02-09 11:03 ` Al Viro 2009-02-09 11:58 ` Al Viro 2009-02-10 5:47 ` Li Zefan 2009-02-09 9:34 ` Al Viro 2009-02-09 11:30 ` Li Zefan 2009-02-12 6:10 ` Li Zefan 2009-02-12 6:24 ` Al Viro 2009-02-12 6:33 ` Li Zefan 2009-02-12 6:54 ` Li Zefan 2009-02-12 7:07 ` Al Viro 2009-02-13 5:09 ` Li Zefan 2009-02-13 5:47 ` Al Viro 2009-02-13 6:12 ` Li Zefan 2009-02-13 6:31 ` Li Zefan 2009-02-13 6:41 ` Al Viro 2009-02-13 7:18 ` Al Viro 2009-02-13 7:26 ` Li Zefan 2009-02-16 1:29 ` Li Zefan 2009-02-16 2:38 ` Al Viro 2009-02-16 2:47 ` Li Zefan 2009-02-16 2:57 ` Al Viro 2009-02-09 17:48 ` Dave Hansen 2009-02-09 18:11 ` Arjan van de Ven
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox