From: Andrew Morton <akpm@linux-foundation.org>
To: Jesper Krogh <jesper@krogh.cc>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Many open/close on same files yeilds "No such file or directory".
Date: Thu, 8 May 2008 23:22:41 -0700 [thread overview]
Message-ID: <20080508232241.4b177abb.akpm@linux-foundation.org> (raw)
In-Reply-To: <4823EA9E.6050703@krogh.cc>
On Fri, 09 May 2008 08:09:34 +0200 Jesper Krogh <jesper@krogh.cc> wrote:
> Andrew Morton wrote:
> >> My feeling is that the script below may reveal the bug on any "busy"
> >> volume, where busy is lots of activity in the OS-cache of the volume,
> >> not on the actual drives.
> >
> > By this do you mean that there has to be a lot of other activity on the
> > system to reproduce it? Stuff which is turning over memory?
>
> Yes, something like that. (sorry for not being able to be more
> concrete). The applications has "high activity" on a few files, not
> spread activity throughout the volume.
>
> > Because one possiblity is that the cached dentry got reclaimed by memory
> > pressure and we have some race bug which causes us to think that the file
> > doesn't exist.
>
> What can i do to explore this theory?
Watch /proc/vmstat while the tests is running. If the "*steal*" numbers
are going up, the system is reclaiming memory.
> Can I disable caching of dentries
> and see it go away?
Nope.
> Does it fit the pattern that it is only the
> "open"-syscall that is hit (not read for example)?
Yes. open() will look up the filename in the dentry cache, read() wil not.
If name lookup has a race agaisnt dentry cache reclaim, something like
this might happen.
But it'd be damned odd.
> > (That still shouldn't happen because the dentry should be marked
> > recently-accessed, but perhaps the underlying inode gets reclaimed or
> > something. Grasping at straws here)
>
> When I disabled the NFS-server and rand my "real-world" program on a
> single processor (make -j 1). It ran through fine. It basically
> gets around 20 million chunks out of differnet file and assemble the
> chuncks in a few other files. This processes more or less 5 individual
> sections, so make can run effectively with a concurrency of 5.
>
> I dont know if there can be any technical reasons for not seeing it on
> internal attached disks? (other than I just hadn't been able to
> reproduce the same error conditions there.
I can't think of any reason.
I guess a suitable test would be to run your little test app and then read
large files as fast as you can from as many disks as you can, to force
memory reclaim. If that triggers the bug, and the bug is more likely to
trigger the faster you read those files, then we have a theory.
Damned odd though.
next prev parent reply other threads:[~2008-05-09 6:22 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-01 15:34 Many open/close on same files yeilds "No such file or directory" Jesper Krogh
2008-05-02 5:39 ` Andrew Morton
2008-05-02 8:20 ` Jesper Krogh
2008-05-01 12:15 ` Arjan van de Ven
2008-05-02 11:03 ` Many open/close on same files yeilds Jesper Krogh
2008-05-01 14:07 ` Arjan van de Ven
2008-05-02 15:19 ` Many open/close on same files yeilds "No such file or directory" Jesper Krogh
2008-05-02 15:47 ` Ray Lee
2008-05-02 15:55 ` Jesper Krogh
2008-05-02 16:45 ` Ray Lee
2008-05-02 19:53 ` Jesper Krogh
2008-05-02 19:52 ` Jesper Krogh
2008-05-05 17:43 ` Jesper Krogh
2008-05-05 17:51 ` Randy.Dunlap
2008-05-05 17:54 ` Jesper Krogh
[not found] ` <2c0942db0805051121r47cc97d2jb71cc8ab9eaa7981@mail.gmail.com>
2008-05-05 18:29 ` Jesper Krogh
[not found] ` <2c0942db0805051154q63a18bcfhce8a30d4a663ea3f@mail.gmail.com>
2008-05-07 20:51 ` Jesper Krogh
2008-05-07 22:27 ` Jesper Krogh
2008-05-02 15:21 ` Jesper Krogh
2008-05-09 5:22 ` Jesper Krogh
2008-05-09 5:36 ` Andrew Morton
2008-05-09 6:09 ` Jesper Krogh
2008-05-09 6:22 ` Andrew Morton [this message]
2008-05-12 1:53 ` Neil Brown
2008-05-12 6:00 ` J. Bruce Fields
2008-05-12 6:41 ` Jesper Krogh
2008-05-12 6:51 ` Andrew Morton
[not found] <aoJcW-38V-37@gated-at.bofh.it>
[not found] ` <aoWjI-1Br-5@gated-at.bofh.it>
[not found] ` <aoYOH-6RO-13@gated-at.bofh.it>
[not found] ` <ap5nc-3ZT-7@gated-at.bofh.it>
[not found] ` <ap5Gx-4vu-43@gated-at.bofh.it>
[not found] ` <ap9Ar-4Nn-21@gated-at.bofh.it>
[not found] ` <aqcZe-7Fg-23@gated-at.bofh.it>
[not found] ` <aqd98-7Vb-25@gated-at.bofh.it>
[not found] ` <aqd99-7Vb-27@gated-at.bofh.it>
2008-05-05 19:05 ` Henry Nestler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080508232241.4b177abb.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=jesper@krogh.cc \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox