From mboxrd@z Thu Jan 1 00:00:00 1970 From: Umair Azam Subject: Re: NFS issue with xenserver 6.2 Date: Fri, 14 Mar 2014 06:25:22 +0500 Message-ID: <53225A82.7040700@i2cinc.com> References: <531E5B35.3030202@i2cinc.com> <531F2D90.3090409@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <531F2D90.3090409@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Zoltan Kiss , xen-devel@lists.xen.org, xen-devel-request@lists.xen.org, "xs-devel@lists.xenserver.org" List-Id: xen-devel@lists.xenproject.org Hi Zoli, When nfs time out log entries appear i am able to ping storage machine which remains up with almost no load. however i have noticed according to xentop xenserver loads goes up to 70% (3 vcpus are allocated core2 duo machine, 1 GB ram to dom 0) and secondary storage VM of cloudstack cpu goes up to 160%, The problem arises when cloudstack tries to launch Secondary storage VM on hypervisor at that time "nfs server not responding, timed out" log entries begin to appear on xenserver and then machine reboots itself (might be thats due to HA enabled). I have replaced the ethernet cables, switch, NIC's but still facing this strange issue. I am unable to figure out why this problem arises. I have also seen the following entries in logs appearing many times. Mar 14 06:04:30 xenserver-1 scripts-vif: Called as "add vif" domid:2 devid:0 mode:bridge Mar 14 06:04:30 xenserver-1 scripts-vif: Called as "online vif" domid:2 devid:0 mode:bridge Mar 14 06:04:30 xenserver-1 scripts-vif: Setting vif2.0 MTU 1500 Mar 14 06:04:30 xenserver-1 scripts-vif: Adding vif2.0 to xapi0 with address fe:ff:ff:ff:ff:ff Mar 14 06:04:30 xenserver-1 scripts-vif: Failed to ip link set vif2.0 address fe:ff:ff:ff:ff:ff Mar 14 06:04:30 xenserver-1 kernel: [ 2890.509223] device vif2.0 entered promiscuous mode Mar 14 06:04:30 xenserver-1 python: /opt/xensource/libexec/setup-vif-rules[23804] - Called with vif_type=vif, domid=2, devid=0, network_mode=bridge, action=filter Mar 14 06:04:30 xenserver-1 python: /opt/xensource/libexec/setup-vif-rules[23804] - attempting to acquire lock /var/lock/ebtables.lock Mar 14 06:04:30 xenserver-1 python: /opt/xensource/libexec/setup-vif-rules[23804] - acquired lock /var/lock/ebtables.lock Mar 14 06:04:30 xenserver-1 python: /opt/xensource/libexec/setup-vif-rules[23804] - ['/sbin/ip', 'link', 'set', 'vif2.0', 'down'] Mar 14 06:04:30 xenserver-1 python: /opt/xensource/libexec/setup-vif-rules[23804] - ['/sbin/ebtables', '-L', 'FORWARD_vif2.0'] Mar 14 06:04:30 xenserver-1 python: /opt/xensource/libexec/setup-vif-rules[23804] - ['/usr/bin/xenstore-read', '/local/domain/0/backend/vif/2/0/mac'] Mar 14 06:04:30 xenserver-1 python: /opt/xensource/libexec/setup-vif-rules[23804] - ['/usr/bin/xenstore-read', '/xapi/2/private/vif/0/locking-mode'] Mar 14 06:04:30 xenserver-1 python: /opt/xensource/libexec/setup-vif-rules[23804] - ['/usr/bin/xenstore-read', '/xapi/2/private/vif/0/ipv4-allowed'] Mar 14 06:04:30 xenserver-1 python: /opt/xensource/libexec/setup-vif-rules[23804] - ['/usr/bin/xenstore-read', '/xapi/2/private/vif/0/ipv6-allowed'] Mar 14 06:04:30 xenserver-1 python: /opt/xensource/libexec/setup-vif-rules[23804] - Got locking config: MAC=0e:00:a9:fe:00:68; locking_mode=unlocked; ipv4_allowed=; ipv6_ allowed= Umair Azam On 3/11/2014 8:36 PM, Zoltan Kiss wrote: > On 11/03/14 00:39, Umair Azam wrote: >> Hi, >> >> I am using xenserver 6.2 and facing nfs timed out issue, this issue has >> been mentioned in 6.0 release notes but why i m facing this issue in >> latest release (6.2) >> >> Mar 11 02:49:05 xenserver-1 kernel: [ 1848.148548] nfs: server >> 10.11.17.33 not responding, timed out >> >> * In some 10 Gigabit Ethernet environments, occasional performance >> problems with disk throughput on NFS SRs have been observed. The >> problem can be identified by a log entry in/var/log/messagessimilar >> to:kernel: nfs: server 10.0.0.1 not responding, timed out. Citrix >> continues to investigate this issue with an aim to resolve it in a >> future release. [CA-59187] >> >> http://support.citrix.com/article/CTX130418 > > That problem were solved a long time ago, this is probably something > different. If reproducible, you should check why the host lose > connection with the NFS server. Things to check: > - can you ping its IP? > - what is the load? top, xentop, "watch -n 1 ovs-dpctl show" can be > useful here, the latter shows how many network flows you have at one > time in OVS. Rapid increase (ie more than a hundred per second) in > "missed: " shows lots of connections going around > - "ovs-dpctl dump-flows " shows the actual flows, you can > actually see if there is a flow entry for that traffic > > I can't comment on how to debug on the storage manager side, but > previous ones could be useful. > > Zoli > >