* Effect of deleting executables of running programs
@ 2004-08-18 18:16 Shriram R
2004-08-18 19:18 ` Chris Wedgwood
2004-08-23 13:30 ` jlnance
0 siblings, 2 replies; 5+ messages in thread
From: Shriram R @ 2004-08-18 18:16 UTC (permalink / raw)
To: linux-kernel
Hi,
Newbie here. I am not sure if I am sending this email
to the right list. My apologies if I am not and I
would be happy if someone can point me to the right
mailing list.
We have a 24 node/48 processor cluster in our lab with
the following specs.
AMD Athlon
Redhat 7.3
Kernel version - 2.4.19
I had around 10 jobs that had been running on the
cluster for about 15 or so days. These were
using a common executable "abcd.out" (compiled in
fortran 90). After they had been running for
about 15 days, I made the mistake of deleting
abcd.out. Immediately about
3 or 4 of my jobs crashed with a "bus error". But,
some 6-7 of my jobs
continued running. I had 2 questions with regards to
this :
a) I always thought that once a job is running, the
executable is
entirely loaded into memory and the abcd.out file
is no longer needed.
If so, then why does the a running job crash on
deleting abcd.out ?
b) To what extent can I trust that the rest of the 6-7
jobs that are
running have not been affected by this deletion of
"abcd.out" ?
Thanks in advance,
shriram.
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Effect of deleting executables of running programs
2004-08-18 18:16 Effect of deleting executables of running programs Shriram R
@ 2004-08-18 19:18 ` Chris Wedgwood
2004-08-23 13:30 ` jlnance
1 sibling, 0 replies; 5+ messages in thread
From: Chris Wedgwood @ 2004-08-18 19:18 UTC (permalink / raw)
To: Shriram R; +Cc: linux-kernel
On Wed, Aug 18, 2004 at 11:16:46AM -0700, Shriram R wrote:
> a) I always thought that once a job is running, the executable is
> entirely loaded into memory and the abcd.out file is no longer
> needed.
Not always, but it doesn't matter since the file actually is removed
from disk until the the last running instance terminates.
> If so, then why does the a running job crash on deleting abcd.out?
I have no idea. But removing an executable whilst running isn't
uncommon, when you upgrade ayour machine this happens many times with
applications and libraries.
> b) To what extent can I trust that the rest of the 6-7 jobs that are
> running have not been affected by this deletion of "abcd.out" ?
Since one crashed already and there are no details as to why I have no
idea.
--cw
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Effect of deleting executables of running programs
[not found] <200408181901.i7IJ1FK08538@orbit-fe.eng.netapp.com>
@ 2004-08-18 20:11 ` Shriram R
2004-08-18 23:41 ` Ryan Cumming
0 siblings, 1 reply; 5+ messages in thread
From: Shriram R @ 2004-08-18 20:11 UTC (permalink / raw)
To: linux-kernel
Yes, the nodes were accessing the executable over NFS.
-shriram
--- Brian Pawlowski <beepy@netapp.com> wrote:
> How were the nodes accessing the executable? Over
> NFS?
>
> > Hi,
> >
> > Newbie here. I am not sure if I am sending this
> email
> > to the right list. My apologies if I am not and I
> > would be happy if someone can point me to the
> right
> > mailing list.
> >
> > We have a 24 node/48 processor cluster in our lab
> with
> > the following specs.
> >
> > AMD Athlon
> > Redhat 7.3
> > Kernel version - 2.4.19
> >
> > I had around 10 jobs that had been running on the
> > cluster for about 15 or so days. These were
> > using a common executable "abcd.out" (compiled in
> > fortran 90). After they had been running for
> > about 15 days, I made the mistake of deleting
> > abcd.out. Immediately about
> > 3 or 4 of my jobs crashed with a "bus error".
> But,
> > some 6-7 of my jobs
> > continued running. I had 2 questions with regards
> to
> > this :
> >
> > a) I always thought that once a job is running,
> the
> > executable is
> > entirely loaded into memory and the abcd.out
> file
> > is no longer needed.
> > If so, then why does the a running job crash on
> > deleting abcd.out ?
> >
> > b) To what extent can I trust that the rest of the
> 6-7
> > jobs that are
> > running have not been affected by this deletion
> of
> > "abcd.out" ?
> >
> > Thanks in advance,
> > shriram.
> >
> >
> >
> >
> > __________________________________
> > Do you Yahoo!?
> > New and Improved Yahoo! Mail - 100MB free storage!
> > http://promotions.yahoo.com/new_mail
> > -
> > To unsubscribe from this list: send the line
> "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>
>
__________________________________
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
http://promotions.yahoo.com/new_mail
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Effect of deleting executables of running programs
2004-08-18 20:11 ` Shriram R
@ 2004-08-18 23:41 ` Ryan Cumming
0 siblings, 0 replies; 5+ messages in thread
From: Ryan Cumming @ 2004-08-18 23:41 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 414 bytes --]
On Wednesday 18 August 2004 13:11, Shriram R wrote:
> Yes, the nodes were accessing the executable over NFS.
My wild guess is that this has to do with executables paging themselves in on
demand. In the NFS case, if the file disappears on the share, paging in will
fail. However, on a local filesystem, the inode won't disappear until all
users are gone, so paging in on demand will still work.
-Ryan
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Effect of deleting executables of running programs
2004-08-18 18:16 Effect of deleting executables of running programs Shriram R
2004-08-18 19:18 ` Chris Wedgwood
@ 2004-08-23 13:30 ` jlnance
1 sibling, 0 replies; 5+ messages in thread
From: jlnance @ 2004-08-23 13:30 UTC (permalink / raw)
To: Shriram R, linux-kernel
On Wed, Aug 18, 2004 at 11:16:46AM -0700, Shriram R wrote:
> a) I always thought that once a job is running, the
> executable is entirely loaded into memory and the
> abcd.out file is no longer needed.
> If so, then why does the a running job crash on
> deleting abcd.out ?
No, programs sections are paged in as needed, so the
parts that are not running may not be in memory.
That said, the Unix way of dealing with files is by
reference counting. This means that you can open a
file and delete it, and it is still kept around on
the disk until you close it (running a program counts
as having its file open). So you are susposed to be
able to delete a program and running instances will not
be affected.
Unfortunatly, as you have discovered, NFS is kinda
sorta almost like a Unix file system, but not really.
You can NOT reliably access deleted files over NFS.
This is the root of what is causing your bus errors.
> b) To what extent can I trust that the rest of the 6-7
> jobs that are running have not been affected by this
> deletion of "abcd.out" ?
They are probably OK.
Thanks,
Jim
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-08-23 13:30 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-18 18:16 Effect of deleting executables of running programs Shriram R
2004-08-18 19:18 ` Chris Wedgwood
2004-08-23 13:30 ` jlnance
[not found] <200408181901.i7IJ1FK08538@orbit-fe.eng.netapp.com>
2004-08-18 20:11 ` Shriram R
2004-08-18 23:41 ` Ryan Cumming
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.