All of lore.kernel.org
 help / color / mirror / Atom feed
From: Razvan Gavril <razvan.g@plutohome.com>
To: Razvan Gavril <razvan.g@plutohome.com>
Cc: nfs@lists.sourceforge.net
Subject: Re: Bug starting 2.6.16 - rpc: bad TCP reclen
Date: Wed, 19 Jul 2006 17:46:34 +0300	[thread overview]
Message-ID: <44BE45CA.3050603@plutohome.com> (raw)
In-Reply-To: <44BCC7A1.30104@plutohome.com>

Razvan Gavril wrote:
> I posted on the linux kernel mailing list but got no answer till now.
>
> I have a nfs server and some diskless computers that that have there 
> root mounted via  nfs from the server. In certain situations the 
> diskless computers fail to write correctly to their nfs mounted 
> filesystem (some files get corrupted). Looking into the nfs server's 
> dmesg, i see this messages:
>
> RPC: bad TCP reclen 0x5e9c5bec (non-terminal)
> RPC: bad TCP reclen 0x29db3277 (large)
> RPC: bad TCP reclen 0x698f6ccf (large)
> RPC: bad TCP reclen 0x336160a9 (large)
> RPC: bad TCP reclen 0x773ffdff (large)
> RPC: bad TCP reclen 0x231b8d5c (non-terminal)
> RPC: bad TCP reclen 0x39902af4 (large)
> RPC: bad TCP reclen 0x6048d9cc (non-terminal)
> RPC: bad TCP reclen 0x212f7e14 (non-terminal)
>
> This errors start to happen when upgrading to 2.6.16 from 2.6.15 but the 
> problem is still present in 2.6.17 kernel. For now i tested like this:
>
> Client - Server - State
> ------------------------
> 2.6.15 - 2.6.15 - Works
> 2.6.15 - 2.6.16 - Errors
> 2.6.16 - 2.6.16 - Errors
> 2.6.16 - 2.6.17 - Errors
> 2.6.17 - 2.6.17 - Errors
>
>  From the looks of it the problem seems to be related to the nfs server 
> implemetation from the kernels newer that 2.6.15.
>
> Those corrupted writes on client + dmesg messages on the server are easy 
> to duplicate when using Debian on the client computers and running this 
> script in parallel on more that 1 client:
>
> while /bin/true ;do
>         apt-get update
>         err=$?
>         [[ $err != 0 ]] && echo "Exiting $err" && exit $err
>
>         # you can replace gdb with any other package
>         apt-get -y install gdb
>         err=$?
>         [[ $err != 0 ]] && echo "Exiting $err" && exit $err
>
>         apt-get -y remove gdb
>         err=$?
>         [[ $err != 0 ]] && echo "Exiting $err" && exit $err
>
>         sleep $(( $RANDOM % 3 ))
> done
>
> After a couple o minutes (1-5min) apt should give a segmentation fault 
> because one of its state files got corrupted (/lib/dpkg/status or 
> other). FYI, the clients DON'T have any common files/dirs so a race 
> condition in apt can't be the cause. It's easy to see that for every apt 
> segfault on the client you'll have a rpc error message on the server.
>
> I also tried with some different script to reproduce the problem, for 
> example to copy a lot of files(small, big ..) from a nfs share to 
> another but the md5sum reported that every time the copying was 
> happening without corruption so using apt is the only solution to 
> reproduce the bug for now.
>
> I'm here if you need any other info related to this problem.
>
> --
> Razvan Gavril
>
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys -- and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
>   
Razvan Gavril wrote:
> I posted on the linux kernel mailing list but got no answer till now.
>
> I have a nfs server and some diskless computers that that have there 
> root mounted via  nfs from the server. In certain situations the 
> diskless computers fail to write correctly to their nfs mounted 
> filesystem (some files get corrupted). Looking into the nfs server's 
> dmesg, i see this messages:
>
> RPC: bad TCP reclen 0x5e9c5bec (non-terminal)
> RPC: bad TCP reclen 0x29db3277 (large)
> RPC: bad TCP reclen 0x698f6ccf (large)
> RPC: bad TCP reclen 0x336160a9 (large)
> RPC: bad TCP reclen 0x773ffdff (large)
> RPC: bad TCP reclen 0x231b8d5c (non-terminal)
> RPC: bad TCP reclen 0x39902af4 (large)
> RPC: bad TCP reclen 0x6048d9cc (non-terminal)
> RPC: bad TCP reclen 0x212f7e14 (non-terminal)
>
> This errors start to happen when upgrading to 2.6.16 from 2.6.15 but the 
> problem is still present in 2.6.17 kernel. For now i tested like this:
>
> Client - Server - State
> ------------------------
> 2.6.15 - 2.6.15 - Works
> 2.6.15 - 2.6.16 - Errors
> 2.6.16 - 2.6.16 - Errors
> 2.6.16 - 2.6.17 - Errors
> 2.6.17 - 2.6.17 - Errors
>
>  From the looks of it the problem seems to be related to the nfs server 
> implemetation from the kernels newer that 2.6.15.
>
> Those corrupted writes on client + dmesg messages on the server are easy 
> to duplicate when using Debian on the client computers and running this 
> script in parallel on more that 1 client:
>
> while /bin/true ;do
>         apt-get update
>         err=$?
>         [[ $err != 0 ]] && echo "Exiting $err" && exit $err
>
>         # you can replace gdb with any other package
>         apt-get -y install gdb
>         err=$?
>         [[ $err != 0 ]] && echo "Exiting $err" && exit $err
>
>         apt-get -y remove gdb
>         err=$?
>         [[ $err != 0 ]] && echo "Exiting $err" && exit $err
>
>         sleep $(( $RANDOM % 3 ))
> done
>
> After a couple o minutes (1-5min) apt should give a segmentation fault 
> because one of its state files got corrupted (/lib/dpkg/status or 
> other). FYI, the clients DON'T have any common files/dirs so a race 
> condition in apt can't be the cause. It's easy to see that for every apt 
> segfault on the client you'll have a rpc error message on the server.
>
> I also tried with some different script to reproduce the problem, for 
> example to copy a lot of files(small, big ..) from a nfs share to 
> another but the md5sum reported that every time the copying was 
> happening without corruption so using apt is the only solution to 
> reproduce the bug for now.
>
> I'm here if you need any other info related to this problem.
>
>   
Can someone at least confirm this bug and give me an idea where to start 
debugging ?

Thanks

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

  reply	other threads:[~2006-07-19 14:46 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-18 11:36 Buf starting 2.6.16 - rpc: bad TCP reclen Razvan Gavril
2006-07-19 14:46 ` Razvan Gavril [this message]
2006-07-19 15:22   ` Bug " Chuck Lever
2006-07-19 15:52     ` Razvan Gavril
2006-07-20  4:29       ` Neil Brown
2006-07-20  8:38         ` Bernd Schubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44BE45CA.3050603@plutohome.com \
    --to=razvan.g@plutohome.com \
    --cc=nfs@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.