All of lore.kernel.org
 help / color / mirror / Atom feed
* [Lustre-devel] Lustre 1.8.2 client, Text file busy
@ 2010-03-29 14:22 Kent Engström
  2010-03-29 14:57 ` Kent Engström
  2010-03-29 23:48 ` [Lustre-devel] [slurm-dev] " Christopher J. Morrone
  0 siblings, 2 replies; 3+ messages in thread
From: Kent Engström @ 2010-03-29 14:22 UTC (permalink / raw)
  To: lustre-devel

[Cc: to the slurm-dev list as this has been discussed there.]

After an upgrade to Lustre 1.8.2 (patchless client on top of Centos 5.4)
on one of our compute clusters, we have been getting reports of 
spurious "Text file busy" messages.

I have not seen any reports on the Lustre lists about this yet.

A colleague of mine was able to reproduce it reliably, and I've written
a small reproducer script:

$ cat reproducer.sh
#!/bin/sh

rm myscript
cat <<EOF >myscript
#!/bin/sh
echo "running"
EOF
chmod +x myscript

rm mycopy
i=0
while :; do
  i=$(expr $i + 1)
  echo COPY $i
  cp myscript mycopy
  echo RUN $i
  ./mycopy
  sleep 1
done

When I run this on a Lustre filesystem, I invariably get:

$ ./reproducer.sh 
COPY 1
RUN 1
running
COPY 2
RUN 2
./reproducer.sh: ./mycopy: /bin/sh: bad interpreter: Text file busy
COPY 3
RUN 3
running
COPY 4
RUN 4
running
COPY 5
RUN 5
running
COPY 6
RUN 6
running
...

If I insert an "rm mycopy" command before the copy, I get no error.

$ uname -r; rpm -q lustre
2.6.18-164.15.1.el5
lustre-1.8.2-2.6.18_164.15.1.el5_201003191115

(patchless client built from the 1.8.2 source with "make rpms")

The servers for the filesystem are running
"lustre-1.6.7.1-2.6.18_92.1.17.el5_lustre.1.6.7.1smp".

I've tested the same code on another cluster that mounts the same
filesystem. It runs CentOS 4 with patchless client 
lustre-1.6.7.2-2.6.9_89.0.19.ELsmp_201001151307.
The error cannot be reproduced there.

I also expect that there will be no "Text file busy" error when I revert
a node on the first cluster to 1.6.7.1 and run the test script, which I
will proceed to do now.

-- 
Kent Engstr?m, National Supercomputer Centre
kent at nsc.liu.se, +46 13 28 4444

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Lustre-devel] Lustre 1.8.2 client, Text file busy
  2010-03-29 14:22 [Lustre-devel] Lustre 1.8.2 client, Text file busy Kent Engström
@ 2010-03-29 14:57 ` Kent Engström
  2010-03-29 23:48 ` [Lustre-devel] [slurm-dev] " Christopher J. Morrone
  1 sibling, 0 replies; 3+ messages in thread
From: Kent Engström @ 2010-03-29 14:57 UTC (permalink / raw)
  To: lustre-devel

kent at nsc.liu.se (Kent Engstr?m) writes:
> After an upgrade to Lustre 1.8.2 (patchless client on top of Centos 5.4)
> on one of our compute clusters, we have been getting reports of 
> spurious "Text file busy" messages.
...

[Can reproduce on client running]
> $ uname -r; rpm -q lustre
> 2.6.18-164.15.1.el5
> lustre-1.8.2-2.6.18_164.15.1.el5_201003191115
...
> I also expect that there will be no "Text file busy" error when I revert
> a node on the first cluster to 1.6.7.1 and run the test script, which I
> will proceed to do now.

I have now tested on a node reverted to

$ uname -r; rpm -q lustre
2.6.18-164.11.1.el5
lustre-1.6.7.1-2.6.18_164.11.1.el5_201001202236

On that node, I could not reproduce the problem.


-- 
Kent Engstr?m, National Supercomputer Centre
kent at nsc.liu.se, +46 13 28 4444

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Lustre-devel] [slurm-dev] Lustre 1.8.2 client, Text file busy
  2010-03-29 14:22 [Lustre-devel] Lustre 1.8.2 client, Text file busy Kent Engström
  2010-03-29 14:57 ` Kent Engström
@ 2010-03-29 23:48 ` Christopher J. Morrone
  1 sibling, 0 replies; 3+ messages in thread
From: Christopher J. Morrone @ 2010-03-29 23:48 UTC (permalink / raw)
  To: lustre-devel

We opened bug 22492 on this issue.  Feel free to attach your reproducer 
script and observations there!

   https://bugzilla.lustre.org/show_bug.cgi?id=22492

Chris

Kent Engstr?m wrote:
> [Cc: to the slurm-dev list as this has been discussed there.]
> 
> After an upgrade to Lustre 1.8.2 (patchless client on top of Centos 5.4)
> on one of our compute clusters, we have been getting reports of 
> spurious "Text file busy" messages.
> 
> I have not seen any reports on the Lustre lists about this yet.
> 
> A colleague of mine was able to reproduce it reliably, and I've written
> a small reproducer script:
> 
> $ cat reproducer.sh
> #!/bin/sh
> 
> rm myscript
> cat <<EOF >myscript
> #!/bin/sh
> echo "running"
> EOF
> chmod +x myscript
> 
> rm mycopy
> i=0
> while :; do
>   i=$(expr $i + 1)
>   echo COPY $i
>   cp myscript mycopy
>   echo RUN $i
>   ./mycopy
>   sleep 1
> done
> 
> When I run this on a Lustre filesystem, I invariably get:
> 
> $ ./reproducer.sh 
> COPY 1
> RUN 1
> running
> COPY 2
> RUN 2
> ./reproducer.sh: ./mycopy: /bin/sh: bad interpreter: Text file busy
> COPY 3
> RUN 3
> running
> COPY 4
> RUN 4
> running
> COPY 5
> RUN 5
> running
> COPY 6
> RUN 6
> running
> ...
> 
> If I insert an "rm mycopy" command before the copy, I get no error.
> 
> $ uname -r; rpm -q lustre
> 2.6.18-164.15.1.el5
> lustre-1.8.2-2.6.18_164.15.1.el5_201003191115
> 
> (patchless client built from the 1.8.2 source with "make rpms")
> 
> The servers for the filesystem are running
> "lustre-1.6.7.1-2.6.18_92.1.17.el5_lustre.1.6.7.1smp".
> 
> I've tested the same code on another cluster that mounts the same
> filesystem. It runs CentOS 4 with patchless client 
> lustre-1.6.7.2-2.6.9_89.0.19.ELsmp_201001151307.
> The error cannot be reproduced there.
> 
> I also expect that there will be no "Text file busy" error when I revert
> a node on the first cluster to 1.6.7.1 and run the test script, which I
> will proceed to do now.
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-03-29 23:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-29 14:22 [Lustre-devel] Lustre 1.8.2 client, Text file busy Kent Engström
2010-03-29 14:57 ` Kent Engström
2010-03-29 23:48 ` [Lustre-devel] [slurm-dev] " Christopher J. Morrone

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.