* Fwd: Unstable NFS mount at heavy load. [not found] <CAKhsbWbonNA-wdQYKZEAZ2pJOkxK5gtGXCY0YCP3hGF98_idBg@mail.gmail.com> @ 2013-01-05 5:12 ` G.R. 2013-01-08 16:25 ` G.R. 0 siblings, 1 reply; 10+ messages in thread From: G.R. @ 2013-01-05 5:12 UTC (permalink / raw) To: xen-devel Forward this to the devel list. ---------- Forwarded message ---------- From: G.R. <firemeteor@users.sourceforge.net> Date: Sat, Jan 5, 2013 at 1:12 AM Subject: Unstable NFS mount at heavy load. To: xen-users@lists.xen.org I was running benchmark on IO performance using iozone3. In my build, the dom0 resides on a small usb stick and all the storage comes from a NFS mount. I test NFS performance on both dom0 && domU, mounting from the same server. The dom0 test works just well, but the domU run suffers from unstable NFS mount. Since this is a NFS root, the domU just appear to be freezed. The log from both end of the NFS mount shows that the connection is broken: Note that the client time stamp is about 20 seconds ahead of server. >From the domU (client end): Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 not responding, still trying //(once) Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 not responding, still trying //(28 times within the same second) Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 not responding, still trying //(once) Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 not responding, still trying //(14 times within the same second) Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 not responding, still trying //(15 times within the same second) Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 not responding, still trying //(once) Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 OK //(25 times within the same second) Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 OK //(32 times within the same second) Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 not responding, still trying //(21 times within the same second) Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 OK //(25 times within the same second) Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 OK //(91 times within the same second) Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 not responding, still trying Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK >From the server side: Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! Any suggestion how to debug this issue? My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. Thanks, Timothy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Unstable NFS mount at heavy load. 2013-01-05 5:12 ` Fwd: Unstable NFS mount at heavy load G.R. @ 2013-01-08 16:25 ` G.R. 2013-01-08 17:15 ` Stefano Stabellini 0 siblings, 1 reply; 10+ messages in thread From: G.R. @ 2013-01-08 16:25 UTC (permalink / raw) To: xen-devel, Stefano Stabellini Nobody responses... Stefano, could you point me to the PVNET owner? I suspect this has something to do with the net emulation. Thanks, Timothy On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor@users.sourceforge.net> wrote: > Forward this to the devel list. > > > ---------- Forwarded message ---------- > From: G.R. <firemeteor@users.sourceforge.net> > Date: Sat, Jan 5, 2013 at 1:12 AM > Subject: Unstable NFS mount at heavy load. > To: xen-users@lists.xen.org > > > I was running benchmark on IO performance using iozone3. > In my build, the dom0 resides on a small usb stick and all the storage > comes from a NFS mount. > I test NFS performance on both dom0 && domU, mounting from the same server. > > The dom0 test works just well, but the domU run suffers from unstable NFS mount. > Since this is a NFS root, the domU just appear to be freezed. > > The log from both end of the NFS mount shows that the connection is broken: > Note that the client time stamp is about 20 seconds ahead of server. > > From the domU (client end): > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 > not responding, still trying //(once) > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 > not responding, still trying //(28 times within the same second) > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 > not responding, still trying //(once) > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 > not responding, still trying //(14 times within the same second) > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 > not responding, still trying //(15 times within the same second) > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 > not responding, still trying //(once) > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 > OK //(25 times within the same > second) > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 > OK //(32 times within the same > second) > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 > not responding, still trying //(21 times within the same second) > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 > OK //(25 times within the same > second) > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 > OK //(91 times within the same > second) > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 > not responding, still trying > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK > > From the server side: > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > sending 140 bytes - shutting down socket > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > sending 140 bytes - shutting down socket > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > sending 140 bytes - shutting down socket > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > sending 140 bytes - shutting down socket > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > sending 140 bytes - shutting down socket > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! > > > Any suggestion how to debug this issue? > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. > > Thanks, > Timothy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Unstable NFS mount at heavy load. 2013-01-08 16:25 ` G.R. @ 2013-01-08 17:15 ` Stefano Stabellini 2013-01-09 8:47 ` G.R. 0 siblings, 1 reply; 10+ messages in thread From: Stefano Stabellini @ 2013-01-08 17:15 UTC (permalink / raw) To: G.R.; +Cc: Stefano Stabellini, xen-devel Do you mean the maintainer of the Linux PV network frontend and backend drivers (netfront and netback)? That would be Konrad. On Tue, 8 Jan 2013, G.R. wrote: > Nobody responses... > > Stefano, could you point me to the PVNET owner? > I suspect this has something to do with the net emulation. > > Thanks, > Timothy > > On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor@users.sourceforge.net> wrote: > > Forward this to the devel list. > > > > > > ---------- Forwarded message ---------- > > From: G.R. <firemeteor@users.sourceforge.net> > > Date: Sat, Jan 5, 2013 at 1:12 AM > > Subject: Unstable NFS mount at heavy load. > > To: xen-users@lists.xen.org > > > > > > I was running benchmark on IO performance using iozone3. > > In my build, the dom0 resides on a small usb stick and all the storage > > comes from a NFS mount. > > I test NFS performance on both dom0 && domU, mounting from the same server. > > > > The dom0 test works just well, but the domU run suffers from unstable NFS mount. > > Since this is a NFS root, the domU just appear to be freezed. > > > > The log from both end of the NFS mount shows that the connection is broken: > > Note that the client time stamp is about 20 seconds ahead of server. > > > > From the domU (client end): > > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 > > not responding, still trying //(once) > > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 > > not responding, still trying //(28 times within the same second) > > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 > > not responding, still trying //(once) > > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 > > not responding, still trying //(14 times within the same second) > > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 > > not responding, still trying //(15 times within the same second) > > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 > > not responding, still trying //(once) > > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 > > OK //(25 times within the same > > second) > > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 > > OK //(32 times within the same > > second) > > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 > > not responding, still trying //(21 times within the same second) > > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 > > OK //(25 times within the same > > second) > > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK > > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK > > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 > > OK //(91 times within the same > > second) > > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK > > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK > > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK > > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 > > not responding, still trying > > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK > > > > From the server side: > > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > > sending 140 bytes - shutting down socket > > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! > > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > > sending 140 bytes - shutting down socket > > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > > sending 140 bytes - shutting down socket > > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > > sending 140 bytes - shutting down socket > > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed > > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! > > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > > sending 140 bytes - shutting down socket > > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! > > > > > > Any suggestion how to debug this issue? > > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. > > > > Thanks, > > Timothy > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Unstable NFS mount at heavy load. 2013-01-08 17:15 ` Stefano Stabellini @ 2013-01-09 8:47 ` G.R. 2013-01-15 16:50 ` G.R. 0 siblings, 1 reply; 10+ messages in thread From: G.R. @ 2013-01-09 8:47 UTC (permalink / raw) To: Konrad Rzeszutek Wilk; +Cc: Stefano Stabellini, xen-devel Hi Konrad, Do you have any suggestion how to troubleshooting the NFS mount issue as described below? The broken connection is quite suspicious to me. Thanks, Timothy On Wed, Jan 9, 2013 at 1:15 AM, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote: > Do you mean the maintainer of the Linux PV network frontend and backend > drivers (netfront and netback)? > That would be Konrad. > > On Tue, 8 Jan 2013, G.R. wrote: >> Nobody responses... >> >> Stefano, could you point me to the PVNET owner? >> I suspect this has something to do with the net emulation. >> >> Thanks, >> Timothy >> >> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor@users.sourceforge.net> wrote: >> > Forward this to the devel list. >> > >> > >> > ---------- Forwarded message ---------- >> > From: G.R. <firemeteor@users.sourceforge.net> >> > Date: Sat, Jan 5, 2013 at 1:12 AM >> > Subject: Unstable NFS mount at heavy load. >> > To: xen-users@lists.xen.org >> > >> > >> > I was running benchmark on IO performance using iozone3. >> > In my build, the dom0 resides on a small usb stick and all the storage >> > comes from a NFS mount. >> > I test NFS performance on both dom0 && domU, mounting from the same server. >> > >> > The dom0 test works just well, but the domU run suffers from unstable NFS mount. >> > Since this is a NFS root, the domU just appear to be freezed. >> > >> > The log from both end of the NFS mount shows that the connection is broken: >> > Note that the client time stamp is about 20 seconds ahead of server. >> > >> > From the domU (client end): >> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 >> > not responding, still trying //(once) >> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 >> > not responding, still trying //(28 times within the same second) >> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 >> > not responding, still trying //(once) >> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 >> > not responding, still trying //(14 times within the same second) >> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 >> > not responding, still trying //(15 times within the same second) >> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 >> > not responding, still trying //(once) >> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 >> > OK //(25 times within the same >> > second) >> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 >> > OK //(32 times within the same >> > second) >> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 >> > not responding, still trying //(21 times within the same second) >> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 >> > OK //(25 times within the same >> > second) >> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK >> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK >> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 >> > OK //(91 times within the same >> > second) >> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK >> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK >> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK >> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 >> > not responding, still trying >> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK >> > >> > From the server side: >> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> > sending 140 bytes - shutting down socket >> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! >> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> > sending 140 bytes - shutting down socket >> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! >> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! >> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> > sending 140 bytes - shutting down socket >> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> > sending 140 bytes - shutting down socket >> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed >> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! >> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> > sending 140 bytes - shutting down socket >> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! >> > >> > >> > Any suggestion how to debug this issue? >> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. >> > >> > Thanks, >> > Timothy >> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Unstable NFS mount at heavy load. 2013-01-09 8:47 ` G.R. @ 2013-01-15 16:50 ` G.R. 2013-01-18 16:14 ` Konrad Rzeszutek Wilk 0 siblings, 1 reply; 10+ messages in thread From: G.R. @ 2013-01-15 16:50 UTC (permalink / raw) To: Konrad Rzeszutek Wilk; +Cc: Stefano Stabellini, xen-devel Hi Konrad, do you have any suggestion how to debug? Thanks, Timothy On Wed, Jan 9, 2013 at 4:47 PM, G.R. <firemeteor@users.sourceforge.net> wrote: > Hi Konrad, > Do you have any suggestion how to troubleshooting the NFS mount issue > as described below? > The broken connection is quite suspicious to me. > > Thanks, > Timothy > > On Wed, Jan 9, 2013 at 1:15 AM, Stefano Stabellini > <stefano.stabellini@eu.citrix.com> wrote: >> Do you mean the maintainer of the Linux PV network frontend and backend >> drivers (netfront and netback)? >> That would be Konrad. >> >> On Tue, 8 Jan 2013, G.R. wrote: >>> Nobody responses... >>> >>> Stefano, could you point me to the PVNET owner? >>> I suspect this has something to do with the net emulation. >>> >>> Thanks, >>> Timothy >>> >>> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor@users.sourceforge.net> wrote: >>> > Forward this to the devel list. >>> > >>> > >>> > ---------- Forwarded message ---------- >>> > From: G.R. <firemeteor@users.sourceforge.net> >>> > Date: Sat, Jan 5, 2013 at 1:12 AM >>> > Subject: Unstable NFS mount at heavy load. >>> > To: xen-users@lists.xen.org >>> > >>> > >>> > I was running benchmark on IO performance using iozone3. >>> > In my build, the dom0 resides on a small usb stick and all the storage >>> > comes from a NFS mount. >>> > I test NFS performance on both dom0 && domU, mounting from the same server. >>> > >>> > The dom0 test works just well, but the domU run suffers from unstable NFS mount. >>> > Since this is a NFS root, the domU just appear to be freezed. >>> > >>> > The log from both end of the NFS mount shows that the connection is broken: >>> > Note that the client time stamp is about 20 seconds ahead of server. >>> > >>> > From the domU (client end): >>> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 >>> > not responding, still trying //(once) >>> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 >>> > not responding, still trying //(28 times within the same second) >>> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 >>> > not responding, still trying //(once) >>> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 >>> > not responding, still trying //(14 times within the same second) >>> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 >>> > not responding, still trying //(15 times within the same second) >>> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 >>> > not responding, still trying //(once) >>> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 >>> > OK //(25 times within the same >>> > second) >>> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 >>> > OK //(32 times within the same >>> > second) >>> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 >>> > not responding, still trying //(21 times within the same second) >>> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 >>> > OK //(25 times within the same >>> > second) >>> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK >>> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK >>> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 >>> > OK //(91 times within the same >>> > second) >>> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK >>> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK >>> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK >>> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 >>> > not responding, still trying >>> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK >>> > >>> > From the server side: >>> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >>> > sending 140 bytes - shutting down socket >>> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! >>> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >>> > sending 140 bytes - shutting down socket >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! >>> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >>> > sending 140 bytes - shutting down socket >>> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >>> > sending 140 bytes - shutting down socket >>> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed >>> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! >>> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >>> > sending 140 bytes - shutting down socket >>> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! >>> > >>> > >>> > Any suggestion how to debug this issue? >>> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. >>> > >>> > Thanks, >>> > Timothy >>> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Unstable NFS mount at heavy load. 2013-01-15 16:50 ` G.R. @ 2013-01-18 16:14 ` Konrad Rzeszutek Wilk 2013-01-20 16:01 ` G.R. 0 siblings, 1 reply; 10+ messages in thread From: Konrad Rzeszutek Wilk @ 2013-01-18 16:14 UTC (permalink / raw) To: G.R.; +Cc: xen-devel, Stefano Stabellini On Wed, Jan 16, 2013 at 12:50:08AM +0800, G.R. wrote: > Hi Konrad, do you have any suggestion how to debug? Is your dom0 32-bit or 64-bit? And what kind of network card are you using for the NFS traffic? > > Thanks, > Timothy > > On Wed, Jan 9, 2013 at 4:47 PM, G.R. <firemeteor@users.sourceforge.net> wrote: > > Hi Konrad, > > Do you have any suggestion how to troubleshooting the NFS mount issue > > as described below? > > The broken connection is quite suspicious to me. > > > > Thanks, > > Timothy > > > > On Wed, Jan 9, 2013 at 1:15 AM, Stefano Stabellini > > <stefano.stabellini@eu.citrix.com> wrote: > >> Do you mean the maintainer of the Linux PV network frontend and backend > >> drivers (netfront and netback)? > >> That would be Konrad. > >> > >> On Tue, 8 Jan 2013, G.R. wrote: > >>> Nobody responses... > >>> > >>> Stefano, could you point me to the PVNET owner? > >>> I suspect this has something to do with the net emulation. > >>> > >>> Thanks, > >>> Timothy > >>> > >>> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor@users.sourceforge.net> wrote: > >>> > Forward this to the devel list. > >>> > > >>> > > >>> > ---------- Forwarded message ---------- > >>> > From: G.R. <firemeteor@users.sourceforge.net> > >>> > Date: Sat, Jan 5, 2013 at 1:12 AM > >>> > Subject: Unstable NFS mount at heavy load. > >>> > To: xen-users@lists.xen.org > >>> > > >>> > > >>> > I was running benchmark on IO performance using iozone3. > >>> > In my build, the dom0 resides on a small usb stick and all the storage > >>> > comes from a NFS mount. > >>> > I test NFS performance on both dom0 && domU, mounting from the same server. > >>> > > >>> > The dom0 test works just well, but the domU run suffers from unstable NFS mount. > >>> > Since this is a NFS root, the domU just appear to be freezed. > >>> > > >>> > The log from both end of the NFS mount shows that the connection is broken: > >>> > Note that the client time stamp is about 20 seconds ahead of server. > >>> > > >>> > From the domU (client end): > >>> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 > >>> > not responding, still trying //(once) > >>> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 > >>> > not responding, still trying //(28 times within the same second) > >>> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 > >>> > not responding, still trying //(once) > >>> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 > >>> > not responding, still trying //(14 times within the same second) > >>> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 > >>> > not responding, still trying //(15 times within the same second) > >>> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 > >>> > not responding, still trying //(once) > >>> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 > >>> > OK //(25 times within the same > >>> > second) > >>> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 > >>> > OK //(32 times within the same > >>> > second) > >>> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 > >>> > not responding, still trying //(21 times within the same second) > >>> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 > >>> > OK //(25 times within the same > >>> > second) > >>> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK > >>> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK > >>> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 > >>> > OK //(91 times within the same > >>> > second) > >>> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK > >>> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK > >>> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK > >>> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 > >>> > not responding, still trying > >>> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK > >>> > > >>> > From the server side: > >>> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >>> > sending 140 bytes - shutting down socket > >>> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! > >>> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >>> > sending 140 bytes - shutting down socket > >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > >>> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >>> > sending 140 bytes - shutting down socket > >>> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >>> > sending 140 bytes - shutting down socket > >>> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed > >>> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! > >>> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >>> > sending 140 bytes - shutting down socket > >>> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! > >>> > > >>> > > >>> > Any suggestion how to debug this issue? > >>> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. > >>> > > >>> > Thanks, > >>> > Timothy > >>> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Unstable NFS mount at heavy load. 2013-01-18 16:14 ` Konrad Rzeszutek Wilk @ 2013-01-20 16:01 ` G.R. 2013-01-22 20:29 ` Konrad Rzeszutek Wilk 0 siblings, 1 reply; 10+ messages in thread From: G.R. @ 2013-01-20 16:01 UTC (permalink / raw) To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Stefano Stabellini On Sat, Jan 19, 2013 at 12:14 AM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Wed, Jan 16, 2013 at 12:50:08AM +0800, G.R. wrote: >> Hi Konrad, do you have any suggestion how to debug? > > Is your dom0 32-bit or 64-bit? And what kind of network card are you > using for the NFS traffic? > I have both 64-bit dom0 && domU. The physical card I have is RTL8111/8168B (rev06) (10ec: 8168). And the virtual card I used is e1000, but I guess this is not important since I've seen this in the log: Jan 6 01:31:03 debvm kernel: [ 0.000000] Netfront and the Xen platform PCI driver have been compiled for this kernel: unplug emulated NICs. I'm thinking of dumping the traffic to check when I got spare time. Do you think this is a good idea or do you have other suggestion? Thanks, Timothy PS: I'm on xen testing 4.2.1. The dom0 is a debian 3.6.6 kernel. The domU is a 3.6.9 kernel built from debian source package. >> >> Thanks, >> Timothy >> >> On Wed, Jan 9, 2013 at 4:47 PM, G.R. <firemeteor@users.sourceforge.net> wrote: >> > Hi Konrad, >> > Do you have any suggestion how to troubleshooting the NFS mount issue >> > as described below? >> > The broken connection is quite suspicious to me. >> > >> > Thanks, >> > Timothy >> > >> > On Wed, Jan 9, 2013 at 1:15 AM, Stefano Stabellini >> > <stefano.stabellini@eu.citrix.com> wrote: >> >> Do you mean the maintainer of the Linux PV network frontend and backend >> >> drivers (netfront and netback)? >> >> That would be Konrad. >> >> >> >> On Tue, 8 Jan 2013, G.R. wrote: >> >>> Nobody responses... >> >>> >> >>> Stefano, could you point me to the PVNET owner? >> >>> I suspect this has something to do with the net emulation. >> >>> >> >>> Thanks, >> >>> Timothy >> >>> >> >>> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor@users.sourceforge.net> wrote: >> >>> > Forward this to the devel list. >> >>> > >> >>> > >> >>> > ---------- Forwarded message ---------- >> >>> > From: G.R. <firemeteor@users.sourceforge.net> >> >>> > Date: Sat, Jan 5, 2013 at 1:12 AM >> >>> > Subject: Unstable NFS mount at heavy load. >> >>> > To: xen-users@lists.xen.org >> >>> > >> >>> > >> >>> > I was running benchmark on IO performance using iozone3. >> >>> > In my build, the dom0 resides on a small usb stick and all the storage >> >>> > comes from a NFS mount. >> >>> > I test NFS performance on both dom0 && domU, mounting from the same server. >> >>> > >> >>> > The dom0 test works just well, but the domU run suffers from unstable NFS mount. >> >>> > Since this is a NFS root, the domU just appear to be freezed. >> >>> > >> >>> > The log from both end of the NFS mount shows that the connection is broken: >> >>> > Note that the client time stamp is about 20 seconds ahead of server. >> >>> > >> >>> > From the domU (client end): >> >>> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 >> >>> > not responding, still trying //(once) >> >>> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 >> >>> > not responding, still trying //(28 times within the same second) >> >>> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 >> >>> > not responding, still trying //(once) >> >>> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 >> >>> > not responding, still trying //(14 times within the same second) >> >>> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 >> >>> > not responding, still trying //(15 times within the same second) >> >>> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 >> >>> > not responding, still trying //(once) >> >>> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 >> >>> > OK //(25 times within the same >> >>> > second) >> >>> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 >> >>> > OK //(32 times within the same >> >>> > second) >> >>> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 >> >>> > not responding, still trying //(21 times within the same second) >> >>> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 >> >>> > OK //(25 times within the same >> >>> > second) >> >>> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK >> >>> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK >> >>> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 >> >>> > OK //(91 times within the same >> >>> > second) >> >>> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK >> >>> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK >> >>> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK >> >>> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 >> >>> > not responding, still trying >> >>> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK >> >>> > >> >>> > From the server side: >> >>> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> >>> > sending 140 bytes - shutting down socket >> >>> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! >> >>> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> >>> > sending 140 bytes - shutting down socket >> >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! >> >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! >> >>> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> >>> > sending 140 bytes - shutting down socket >> >>> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> >>> > sending 140 bytes - shutting down socket >> >>> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed >> >>> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! >> >>> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when >> >>> > sending 140 bytes - shutting down socket >> >>> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! >> >>> > >> >>> > >> >>> > Any suggestion how to debug this issue? >> >>> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. >> >>> > >> >>> > Thanks, >> >>> > Timothy >> >>> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Unstable NFS mount at heavy load. 2013-01-20 16:01 ` G.R. @ 2013-01-22 20:29 ` Konrad Rzeszutek Wilk 2013-01-26 12:18 ` G.R. 0 siblings, 1 reply; 10+ messages in thread From: Konrad Rzeszutek Wilk @ 2013-01-22 20:29 UTC (permalink / raw) To: G.R.; +Cc: xen-devel, Stefano Stabellini On Mon, Jan 21, 2013 at 12:01:43AM +0800, G.R. wrote: > On Sat, Jan 19, 2013 at 12:14 AM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: > > On Wed, Jan 16, 2013 at 12:50:08AM +0800, G.R. wrote: > >> Hi Konrad, do you have any suggestion how to debug? > > > > Is your dom0 32-bit or 64-bit? And what kind of network card are you > > using for the NFS traffic? > > > I have both 64-bit dom0 && domU. > The physical card I have is RTL8111/8168B (rev06) (10ec: 8168). > And the virtual card I used is e1000, but I guess this is not > important since I've seen this in the log: > Jan 6 01:31:03 debvm kernel: [ 0.000000] Netfront and the Xen > platform PCI driver have been compiled for this kernel: unplug > emulated NICs. > > I'm thinking of dumping the traffic to check when I got spare time. > Do you think this is a good idea or do you have other suggestion? Well, the thread on "Fatal crash on xen4.2 HVM + qemu-xen dm + NFS" seems to imply that this a problem with NFS tcp-retransmit. And I've seen similar issues as well - but only on skge, tg3, and r8169 - but only when using the 32-bit domain0. I don't know if the issue I am hitting is the same thing. > > Thanks, > Timothy > > PS: I'm on xen testing 4.2.1. The dom0 is a debian 3.6.6 kernel. The > domU is a 3.6.9 kernel built from debian source package. > >> > >> Thanks, > >> Timothy > >> > >> On Wed, Jan 9, 2013 at 4:47 PM, G.R. <firemeteor@users.sourceforge.net> wrote: > >> > Hi Konrad, > >> > Do you have any suggestion how to troubleshooting the NFS mount issue > >> > as described below? > >> > The broken connection is quite suspicious to me. > >> > > >> > Thanks, > >> > Timothy > >> > > >> > On Wed, Jan 9, 2013 at 1:15 AM, Stefano Stabellini > >> > <stefano.stabellini@eu.citrix.com> wrote: > >> >> Do you mean the maintainer of the Linux PV network frontend and backend > >> >> drivers (netfront and netback)? > >> >> That would be Konrad. > >> >> > >> >> On Tue, 8 Jan 2013, G.R. wrote: > >> >>> Nobody responses... > >> >>> > >> >>> Stefano, could you point me to the PVNET owner? > >> >>> I suspect this has something to do with the net emulation. > >> >>> > >> >>> Thanks, > >> >>> Timothy > >> >>> > >> >>> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor@users.sourceforge.net> wrote: > >> >>> > Forward this to the devel list. > >> >>> > > >> >>> > > >> >>> > ---------- Forwarded message ---------- > >> >>> > From: G.R. <firemeteor@users.sourceforge.net> > >> >>> > Date: Sat, Jan 5, 2013 at 1:12 AM > >> >>> > Subject: Unstable NFS mount at heavy load. > >> >>> > To: xen-users@lists.xen.org > >> >>> > > >> >>> > > >> >>> > I was running benchmark on IO performance using iozone3. > >> >>> > In my build, the dom0 resides on a small usb stick and all the storage > >> >>> > comes from a NFS mount. > >> >>> > I test NFS performance on both dom0 && domU, mounting from the same server. > >> >>> > > >> >>> > The dom0 test works just well, but the domU run suffers from unstable NFS mount. > >> >>> > Since this is a NFS root, the domU just appear to be freezed. > >> >>> > > >> >>> > The log from both end of the NFS mount shows that the connection is broken: > >> >>> > Note that the client time stamp is about 20 seconds ahead of server. > >> >>> > > >> >>> > From the domU (client end): > >> >>> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8 > >> >>> > not responding, still trying //(once) > >> >>> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8 > >> >>> > not responding, still trying //(28 times within the same second) > >> >>> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8 > >> >>> > not responding, still trying //(once) > >> >>> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8 > >> >>> > not responding, still trying //(14 times within the same second) > >> >>> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8 > >> >>> > not responding, still trying //(15 times within the same second) > >> >>> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8 > >> >>> > not responding, still trying //(once) > >> >>> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8 > >> >>> > OK //(25 times within the same > >> >>> > second) > >> >>> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8 > >> >>> > OK //(32 times within the same > >> >>> > second) > >> >>> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8 > >> >>> > not responding, still trying //(21 times within the same second) > >> >>> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8 > >> >>> > OK //(25 times within the same > >> >>> > second) > >> >>> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK > >> >>> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK > >> >>> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8 > >> >>> > OK //(91 times within the same > >> >>> > second) > >> >>> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK > >> >>> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK > >> >>> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK > >> >>> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8 > >> >>> > not responding, still trying > >> >>> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK > >> >>> > > >> >>> > From the server side: > >> >>> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >> >>> > sending 140 bytes - shutting down socket > >> >>> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)! > >> >>> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >> >>> > sending 140 bytes - shutting down socket > >> >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > >> >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)! > >> >>> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >> >>> > sending 140 bytes - shutting down socket > >> >>> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >> >>> > sending 140 bytes - shutting down socket > >> >>> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed > >> >>> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)! > >> >>> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when > >> >>> > sending 140 bytes - shutting down socket > >> >>> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)! > >> >>> > > >> >>> > > >> >>> > Any suggestion how to debug this issue? > >> >>> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM. > >> >>> > > >> >>> > Thanks, > >> >>> > Timothy > >> >>> > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xen.org > >> http://lists.xen.org/xen-devel > >> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Unstable NFS mount at heavy load. 2013-01-22 20:29 ` Konrad Rzeszutek Wilk @ 2013-01-26 12:18 ` G.R. 2013-01-26 16:17 ` G.R. 0 siblings, 1 reply; 10+ messages in thread From: G.R. @ 2013-01-26 12:18 UTC (permalink / raw) To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Stefano Stabellini [-- Attachment #1: Type: text/plain, Size: 2011 bytes --] On Wed, Jan 23, 2013 at 4:29 AM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Mon, Jan 21, 2013 at 12:01:43AM +0800, G.R. wrote: >> On Sat, Jan 19, 2013 at 12:14 AM, Konrad Rzeszutek Wilk >> <konrad.wilk@oracle.com> wrote: >> > On Wed, Jan 16, 2013 at 12:50:08AM +0800, G.R. wrote: >> >> Hi Konrad, do you have any suggestion how to debug? >> > >> > Is your dom0 32-bit or 64-bit? And what kind of network card are you >> > using for the NFS traffic? >> > >> I have both 64-bit dom0 && domU. >> The physical card I have is RTL8111/8168B (rev06) (10ec: 8168). >> And the virtual card I used is e1000, but I guess this is not >> important since I've seen this in the log: >> Jan 6 01:31:03 debvm kernel: [ 0.000000] Netfront and the Xen >> platform PCI driver have been compiled for this kernel: unplug >> emulated NICs. >> >> I'm thinking of dumping the traffic to check when I got spare time. >> Do you think this is a good idea or do you have other suggestion? > > Well, the thread on "Fatal crash on xen4.2 HVM + qemu-xen dm + NFS" > seems to imply that this a problem with NFS tcp-retransmit. > > And I've seen similar issues as well - but only on skge, tg3, and > r8169 - but only when using the 32-bit domain0. > I don't know if the issue I am hitting is the same thing. > I checked the thread and unfortunately did not find anything conclusive. In my case, my dom0 seems to work fine and even the domU is still alive -- everything back to order after the mount recovered (typically in a couple of minutes). According to the traffic I captured, server is kind of busy and keep sending ZeroWindow for a while and the client in domU reset the connection after retrying 6 times within 15 seconds. I'm not sure if this is a correct client behavior while the server is doing wrong. But why does this only happen in domU client? Please find the traffic log in the attached file. I've captured the traffic from both server and domU. And it appears that there is no mismatch. Thanks, Timothy [-- Attachment #2: server.view --] [-- Type: application/octet-stream, Size: 3916 bytes --] No. Time Source Destination Protocol Length Info 4717 18:17:06.912678000 192.168.1.8 192.168.1.5 TCP 66 nfs > 914 [ACK] Seq=17601 Ack=17986685 Win=46 Len=0 TSval=31207570 TSecr=1280760 4718 18:17:06.913096000 192.168.1.5 192.168.1.8 TCP 5858 914 > nfs [ACK] Seq=17986685 Ack=17601 Win=7779 Len=5792 TSval=1280770 TSecr=31207570 4719 18:17:06.952672000 192.168.1.8 192.168.1.5 TCP 66 nfs > 914 [ACK] Seq=17601 Ack=17992477 Win=1 Len=0 TSval=31207610 TSecr=1280770 4720 18:17:07.177203000 192.168.1.5 192.168.1.8 TCP 194 914 > nfs [PSH, ACK] Seq=17992477 Ack=17601 Win=7779 Len=128 TSval=1280836 TSecr=31207610 4721 18:17:07.177225000 192.168.1.8 192.168.1.5 TCP 66 [TCP ZeroWindow] nfs > 914 [ACK] Seq=17601 Ack=17992605 Win=0 Len=0 TSval=31207834 TSecr=1280836 4722 18:17:07.396995000 192.168.1.5 192.168.1.8 TCP 66 [TCP Keep-Alive] 914 > nfs [ACK] Seq=17992604 Ack=17601 Win=7779 Len=0 TSval=1280891 TSecr=31207834 4723 18:17:07.397003000 192.168.1.8 192.168.1.5 TCP 66 [TCP ZeroWindow] nfs > 914 [ACK] Seq=17601 Ack=17992605 Win=0 Len=0 TSval=31208054 TSecr=1280836 4726 18:17:07.837025000 192.168.1.5 192.168.1.8 TCP 66 [TCP Keep-Alive] 914 > nfs [ACK] Seq=17992604 Ack=17601 Win=7779 Len=0 TSval=1281001 TSecr=31208054 4727 18:17:07.837045000 192.168.1.8 192.168.1.5 TCP 66 [TCP ZeroWindow] nfs > 914 [ACK] Seq=17601 Ack=17992605 Win=0 Len=0 TSval=31208494 TSecr=1280836 4732 18:17:08.717043000 192.168.1.5 192.168.1.8 TCP 66 [TCP Keep-Alive] 914 > nfs [ACK] Seq=17992604 Ack=17601 Win=7779 Len=0 TSval=1281221 TSecr=31208494 4733 18:17:08.717063000 192.168.1.8 192.168.1.5 TCP 66 [TCP ZeroWindow] nfs > 914 [ACK] Seq=17601 Ack=17992605 Win=0 Len=0 TSval=31209374 TSecr=1280836 4740 18:17:10.481113000 192.168.1.5 192.168.1.8 TCP 66 [TCP Keep-Alive] 914 > nfs [ACK] Seq=17992604 Ack=17601 Win=7779 Len=0 TSval=1281662 TSecr=31209374 4741 18:17:10.481131000 192.168.1.8 192.168.1.5 TCP 66 [TCP ZeroWindow] nfs > 914 [ACK] Seq=17601 Ack=17992605 Win=0 Len=0 TSval=31211138 TSecr=1280836 4759 18:17:14.009182000 192.168.1.5 192.168.1.8 TCP 66 [TCP Keep-Alive] 914 > nfs [ACK] Seq=17992604 Ack=17601 Win=7779 Len=0 TSval=1282544 TSecr=31211138 4760 18:17:14.009202000 192.168.1.8 192.168.1.5 TCP 66 [TCP ZeroWindow] nfs > 914 [ACK] Seq=17601 Ack=17992605 Win=0 Len=0 TSval=31214666 TSecr=1280836 4791 18:17:21.065375000 192.168.1.5 192.168.1.8 TCP 66 [TCP Keep-Alive] 914 > nfs [ACK] Seq=17992604 Ack=17601 Win=7779 Len=0 TSval=1284308 TSecr=31214666 4792 18:17:21.065388000 192.168.1.8 192.168.1.5 TCP 66 [TCP ZeroWindow] nfs > 914 [ACK] Seq=17601 Ack=17992605 Win=0 Len=0 TSval=31221722 TSecr=1280836 4797 18:17:22.521358000 192.168.1.5 192.168.1.8 TCP 66 914 > nfs [RST, ACK] Seq=17992605 Ack=17601 Win=7779 Len=0 TSval=1284672 TSecr=31221722 4810 18:17:25.529464000 192.168.1.5 192.168.1.8 TCP 74 [TCP Port numbers reused] 914 > nfs [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=1285424 TSecr=0 WS=128 4811 18:17:25.529488000 192.168.1.8 192.168.1.5 TCP 74 nfs > 914 [SYN, ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 SACK_PERM=1 TSval=31226186 TSecr=1285424 WS=128 4812 18:17:25.529687000 192.168.1.5 192.168.1.8 TCP 66 914 > nfs [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=1285424 TSecr=31226186 [-- Attachment #3: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Unstable NFS mount at heavy load. 2013-01-26 12:18 ` G.R. @ 2013-01-26 16:17 ` G.R. 0 siblings, 0 replies; 10+ messages in thread From: G.R. @ 2013-01-26 16:17 UTC (permalink / raw) To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Stefano Stabellini > > I checked the thread and unfortunately did not find anything conclusive. > In my case, my dom0 seems to work fine and even the domU is still alive > -- everything back to order after the mount recovered (typically in a > couple of minutes). > > According to the traffic I captured, server is kind of busy and keep > sending ZeroWindow for a while > and the client in domU reset the connection after retrying 6 times > within 15 seconds. > I'm not sure if this is a correct client behavior while the server is > doing wrong. > But why does this only happen in domU client? Well, I have to say sorry about this thread. After some more experiments, I find that the syndrome is not specific to domU. Both dom0 && non-xen system suffer from this issue, so this must be a server fault and has nothing to do with xen. This may have something to do with my weird setup (ext4 on loop image mounted on server and exported through NFS). But anyway this syndrome seems not fixed in recent kernel (3.6.11 tried). Thanks, Timothy ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-01-26 16:17 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CAKhsbWbonNA-wdQYKZEAZ2pJOkxK5gtGXCY0YCP3hGF98_idBg@mail.gmail.com> 2013-01-05 5:12 ` Fwd: Unstable NFS mount at heavy load G.R. 2013-01-08 16:25 ` G.R. 2013-01-08 17:15 ` Stefano Stabellini 2013-01-09 8:47 ` G.R. 2013-01-15 16:50 ` G.R. 2013-01-18 16:14 ` Konrad Rzeszutek Wilk 2013-01-20 16:01 ` G.R. 2013-01-22 20:29 ` Konrad Rzeszutek Wilk 2013-01-26 12:18 ` G.R. 2013-01-26 16:17 ` G.R.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).