From mboxrd@z Thu Jan 1 00:00:00 1970 From: Srinivas Eeda Date: Tue, 23 Jul 2013 17:59:31 -0700 Subject: [Ocfs2-devel] Is it an issue and whether the code changed correct? Thanks a lot In-Reply-To: <51E64E08.3090003@oracle.com> References: <71604351584F6A4EBAE558C676F37CA417BD8D31@H3CMLB02-EX.srv.huawei-3com.com> <51E64E08.3090003@oracle.com> Message-ID: <51EF26F3.6080408@oracle.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ocfs2-devel@oss.oracle.com When network timeout happens one node could timeout before the other. The node that runs into it first will run o2net_idle_timer which initiates a socket shutdown. socket shutdown leads to sending TCP_CLOSE to the other end. If o2net_idle_timer happened on the lower node then nn->nn_timeout won't get set on higher node number because it ran into TCP_CLOSE prior to the timeout itself. Since nn->nn_timeout is not set to 1 it doesn't initiate a reconnect. So the fix is to set nn->timeout to 1. Now either we should move "atomic_set(&nn->nn_timeout, 1)" from o2net_idle_timer to o2net_set_nn_state or set this in o2net_state_change as well. We made this patch along with few other changes and will send it shortly or you could send a proper patch based on Jeff's comments On 07/17/2013 12:55 AM, Jeff Liu wrote: > [Add Srinivas/Xiaofei to CC list as they are investigating OCFS2 net related issues] > > Hi Guo, > > Thanks for your reports and analysis! > > On 07/16/2013 05:06 PM, Guozhonghua wrote: > >> Hi, everyone, is that an issue? >> > That is an issue because we should keep attempting to reconnect > back until the connection is established or captured a disk > heartbeat down event. > > This strategy has been described at upstream commit: > 5cc3bf2786f63cceb191c3c02ddd83c6f38a7d64 > ocfs2: Reconnect after idle time out. > > >> The Server version is Linux 3.2.0-23, Ubuntu 1204. > Generally speaking, we dig into potential problems against the > mainline updated source tree, linux-next is fine for OCFS2. > One important reason is that the facing issue on an old release > might be fixed recently. > >> There are 4 nodes in the OCFS2 Cluster, using three iSCSI LUNS, and >> every LUN is one OCFS2 domain mounted by thread node. >> >> >> >> As the network used buy node has one down/up, the tcp connection between >> node shutdown and reconnected with each other. > >> But there is one scenario that the node whose node number is little, >> shut down the tcp with node whose number is large, the node with large >> node number will not reconnect the node with little node number. >> >> The otherwise is that if the node with large node number shut down the >> tcp with node with little number, the node with large node number will >> reconnect the node with little node number OK. > Could you please clarify your test scenario in a bit more detail? > > Anyway, re-initialize the timeout to trigger reconnection looks fair to me, > but I'd like to see some comments from Srinivas and Xiaofei. > > Btw, that's better if you would make patch via git and setup your email box by > following up the instructions at Documentation/email-clients.txt, please feel free > to drop me an offline email if you have any question regarding this. > > > Thanks, > -Jeff > >> >> >> Such as below: >> >> The server1 syslog is as below: >> >> Jul 9 17:46:10 server1 kernel: [5199872.576027] o2net: Connection to >> node server2 (num 2) at 192.168.70.20:7100 shutdown, state 8 >> >> Jul 9 17:46:10 server1 kernel: [5199872.576111] o2net: No longer >> connected to node server2 (num 2) at 192.168.70.20:7100 >> >> Jul 9 17:46:10 server1 kernel: [5199872.576149] >> (ocfs2dc,14358,1):dlm_send_remote_convert_request:395 ERROR: Error -107 >> when sending message 504 (key 0x3671059b) to node 2 >> >> Jul 9 17:46:10 server1 kernel: [5199872.576162] o2dlm: Waiting on the >> death of node 2 in domain 3656D53908DC4149983BDB1DBBDF1291 >> >> Jul 9 17:46:10 server1 kernel: [5199872.576428] o2net: Accepted >> connection from node server2 (num 2) at 192.168.70.20:7100 >> >> Jul 9 17:46:11 server1 kernel: [5199872.995898] o2net: Connection to >> node server3 (num 3) at 192.168.70.30:7100 has been idle for 30.100 >> secs, shutting it down. >> >> Jul 9 17:46:11 server1 kernel: [5199872.995987] o2net: No longer >> connected to node server3 (num 3) at 192.168.70.30:7100 >> >> Jul 9 17:46:11 server1 kernel: [5199873.069666] o2net: Connection to >> node server4 (num 4) at 192.168.70.40:7100 shutdown, state 8 >> >> Jul 9 17:46:11 server1 kernel: [5199873.069700] o2net: No longer >> connected to node server4 (num 4) at 192.168.70.40:7100 >> >> Jul 9 17:46:11 server1 kernel: [5199873.070385] o2net: Accepted >> connection from node server4 (num 4) at 192.168.70.40:7100 >> >> >> >> The server1 shutdown the tcp connection with server3, but server3 never >> reconnect with server1. >> >> >> >> The server3 syslog is as below: >> >> Jul 9 17:44:12 server3 kernel: [3971907.332698] o2net: Connection to >> node server1 (num 1) at 192.168.70.10:7100 shutdown, state 8 >> >> Jul 9 17:44:12 server3 kernel: [3971907.332748] o2net: No longer >> connected to node server1 (num 1) at 192.168.70.10:7100 >> >> Jul 9 17:44:42 server3 kernel: [3971937.355419] o2net: No connection >> established with node 1 after 30.0 seconds, giving up. >> >> Jul 9 17:45:01 server3 CRON[52349]: (root) CMD (command -v debian-sa1 > >> /dev/null && debian-sa1 1 1) >> >> Jul 9 17:45:12 server3 kernel: [3971967.421656] o2net: No connection >> established with node 1 after 30.0 seconds, giving up. >> >> Jul 9 17:45:42 server3 kernel: [3971997.487949] o2net: No connection >> established with node 1 after 30.0 seconds, giving up. >> >> Jul 9 17:46:12 server3 kernel: [3972027.554258] o2net: No connection >> established with node 1 after 30.0 seconds, giving up. >> >> Jul 9 17:46:42 server3 kernel: [3972057.620496] o2net: No connection >> established with node 1 after 30.0 seconds, giving up. >> >> >> >> The node of server2 and server4 shut down the connection with server1, >> and reconnect them ok. >> >> >> >> I review the code of the ocfs2 kernel and found this may be an issue or bug. >> >> >> >> As node of server1 did not receive msg from server3, he shut the >> connection with server3 and set the timeout with 1. >> >> The server1?s node number is little than server3, so he wait the connect >> request from server3. >> >> static void o2net_idle_timer(unsigned long data) >> >> { >> >> ? ? >> >> printk(KERN_NOTICE "o2net: Connection to " SC_NODEF_FMT " has been " >> >> "idle for %lu.%lu secs, shutting it down.\n", >> SC_NODEF_ARGS(sc), >> >> msecs / 1000, msecs % 1000); >> >> ?.. ? >> >> atomic_set(&nn->nn_timeout, 1); >> >> o2net_sc_queue_work(sc, &sc->sc_shutdown_work); >> >> } >> >> >> >> But the server3 monitoring the TCP connection state changed and shutdown >> connect again and it will never reconnect with server1 because the >> nn->nn_timeout is 0. >> >> >> >> static void o2net_state_change(struct sock *sk) >> >> { >> >> ?? >> >> switch(sk->sk_state) { >> >> ?? >> >> default: >> >> printk(KERN_INFO "AAAAA o2net: Connection to " >> SC_NODEF_FMT >> >> " shutdown, state %d\n", >> >> SC_NODEF_ARGS(sc), sk->sk_state); >> >> o2net_sc_queue_work(sc, &sc->sc_shutdown_work); >> >> break; >> >> } >> >> ? ? >> >> } >> >> >> >> I had test the TCP connect without any shutdown between nodes, but send >> message will failed because the connection state is error. >> >> >> >> >> >> I change the code for the connect triggers in function >> o2net_set_nn_state and o2net_start_connect, and the reconnect rebuild up OK. >> >> Is anyone review the code correct? Thanks a lots. >> >> >> >> root at gzh-dev:~/ocfs2# diff -p -C 10 ./ocfs2_org/cluster/tcp.c >> ocfs2_rep/cluster/tcp.c >> >> *** ./ocfs2_org/cluster/tcp.c 2012-10-29 19:33:19.534200000 +0800 >> >> --- ocfs2_rep/cluster/tcp.c 2013-07-16 16:58:31.380452531 +0800 >> >> *************** static void o2net_set_nn_state(struct o2 >> >> *** 567,586 **** >> >> --- 567,590 ---- >> >> if (!valid && o2net_wq) { >> >> unsigned long delay; >> >> /* delay if we're within a RECONNECT_DELAY of the >> >> * last attempt */ >> >> delay = (nn->nn_last_connect_attempt + >> >> msecs_to_jiffies(o2net_reconnect_delay())) >> >> - jiffies; >> >> if (delay > msecs_to_jiffies(o2net_reconnect_delay())) >> >> delay = 0; >> >> mlog(ML_CONN, "queueing conn attempt in %lu jiffies\n", >> delay); >> >> + >> >> + /** Trigger the reconnection */ >> >> + atomic_set(&nn->nn_timeout, 1); >> >> + >> >> queue_delayed_work(o2net_wq, &nn->nn_connect_work, delay); >> >> >> >> /* >> >> * Delay the expired work after idle timeout. >> >> * >> >> * We might have lots of failed connection attempts that run >> >> * through here but we only cancel the connect_expired >> work when >> >> * a connection attempt succeeds. So only the first >> enqueue of >> >> * the connect_expired work will do anything. The rest >> will see >> >> * that it's already queued and do nothing. >> >> *************** static void o2net_start_connect(struct w >> >> *** 1691,1710 **** >> >> --- 1695,1719 ---- >> >> remoteaddr.sin_family = AF_INET; >> >> remoteaddr.sin_addr.s_addr = node->nd_ipv4_address; >> >> remoteaddr.sin_port = node->nd_ipv4_port; >> >> >> >> ret = sc->sc_sock->ops->connect(sc->sc_sock, >> >> (struct sockaddr *)&remoteaddr, >> >> sizeof(remoteaddr), >> >> O_NONBLOCK); >> >> if (ret == -EINPROGRESS) >> >> ret = 0; >> >> + >> >> + /** Reset the timeout with 0 to avoid connection again, Just for >> test the tcp connection */ >> >> + if (ret == 0) { >> >> + atomic_set(&nn->nn_timeout, 0); >> >> + } >> >> >> >> out: >> >> if (ret) { >> >> printk(KERN_NOTICE "o2net: Connect attempt to " SC_NODEF_FMT >> >> " failed with errno %d\n", SC_NODEF_ARGS(sc), ret); >> >> /* 0 err so that another will be queued and attempted >> >> * from set_nn_state */ >> >> if (sc) >> >> o2net_ensure_shutdown(nn, sc, 0); >> >> } >>