From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?iso-8859-2?q?=A3ukasz_Ma=B6ko?= Subject: How to deal with such hanging processes? Date: Fri, 27 Jan 2012 21:33:35 +0100 Message-ID: <201201272133.35986@laptok.ed.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: QUOTED-PRINTABLE To: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Return-path: Sender: linux-cifs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: I have a Welland ME-752GNS NAS storage. it is capable to serve the file= s=20 only using FTP or CIFS protocol. To quickly transfer data I'm using FTP= , but=20 if I want to mount the disk fot instance to browse my images or watch=20 movies, I'm forced to use cifs. It seems to work, but not too well. First, I realise, that my problems = come=20 mainly from poor CIFS implementation in the NAS firmware, but since it = is=20 the only one I have now and I cannot afford to change it, I must someho= w=20 live with it. The main problem is that quite often something happens wi= th=20 the data transfer. First, it results in such entries in dmesg and logs: [ 5743.489573] CIFS VFS: ignoring corrupt resume name [ 5743.553028] CIFS VFS: ignoring corrupt resume name [ 5743.652823] CIFS VFS: ignoring corrupt resume name [ 5744.822936] CIFS VFS: ignoring corrupt resume name [ 5758.608685] CIFS VFS: ignoring corrupt resume name [ 5770.010003] CIFS VFS: ignoring corrupt resume name [ 5792.937939] CIFS VFS: Send error in read =3D -512 [ 5792.938948] CIFS VFS: No task to wake, unknown frame received! NumMi= ds 2 [ 5792.938958] Received Data is: : dump of 37 bytes of data at 0xf4f4b6= c0 [ 5792.938974] 60000000 424d53ff 0000a4a4 c0018000 . . . ` \xffffffff = S M B=20 =A4 =A4 . . . . . =C0 [ 5792.938988] 00000000 00000000 00000000 2e130006 . . . . . . . . . .= . .=20 =2E . . . [ 5792.938996] 67950002 00000012 . . . g . Especially that part with "CIFS VFS: ignoring corrupt resume name" is=20 happening very often, but it is not causing any major problems. Then, but not always, a process which is performing data transfer hangs= and=20 I'm getting the following errors: [ 6120.569517] INFO: task kio_file:12029 blocked for more than 120 seco= nds. [ 6120.569521] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disab= les=20 this message. [ 6120.569525] kio_file D e417bc30 0 12029 6037 0x00000004 [ 6120.569533] e417bcb4 00000086 e417bc30 e417bc30 e417bc38 31a0b404=20 00000559 00000000 [ 6120.569543] c0724a80 e417bc58 c0724a80 f6707a80 f60ab180 f0c539c0=20 00000020 00000000 [ 6120.569552] e427e3c0 e5602938 00000020 e560293c 000003b7 00000010=20 c0664940 e417bcb4 [ 6120.569561] Call Trace: [ 6120.569575] [] ? ktime_get_ts+0xdc/0x110 [ 6120.569583] [] schedule+0x30/0x50 [ 6120.569588] [] io_schedule+0x73/0xb0 [ 6120.569594] [] sleep_on_page+0x8/0x10 [ 6120.569599] [] __wait_on_bit_lock+0x47/0x90 [ 6120.569604] [] ? __lock_page+0x80/0x80 [ 6120.569609] [] __lock_page+0x76/0x80 [ 6120.569616] [] ? autoremove_wake_function+0x40/0x40 [ 6120.569623] [] __generic_file_splice_read+0x52d/0x550 [ 6120.569630] [] ? sock_alloc_send_pskb+0x15c/0x290 [ 6120.569636] [] ? __alloc_skb+0x5b/0x210 [ 6120.569640] [] ? sock_alloc_send_pskb+0x15c/0x290 [ 6120.569647] [] ? _copy_from_user+0x3d/0x60 [ 6120.569652] [] ? skb_queue_tail+0x37/0x50 [ 6120.569659] [] ? unix_stream_sendmsg+0x3d0/0x420 [ 6120.569665] [] ? page_cache_pipe_buf_release+0x20/0x20 [ 6120.569671] [] generic_file_splice_read+0x94/0x100 [ 6120.569677] [] ? __generic_file_splice_read+0x550/0x550 [ 6120.569682] [] do_splice_to+0x60/0x80 [ 6120.569687] [] splice_direct_to_actor+0xae/0x1d0 [ 6120.569692] [] ? do_splice_from+0x80/0x80 [ 6120.569698] [] do_splice_direct+0x4d/0x70 [ 6120.569705] [] do_sendfile+0x181/0x220 [ 6120.569710] [] sys_sendfile64+0x53/0xc0 [ 6120.569716] [] sysenter_do_call+0x12/0x28 I'm unable to kill this process and it prevents the share from being=20 unmounted: $ ps ax | grep kio_file 12029 ? D 0:00 kdeinit4: kio_file [kdeinit] file=20 local:/home/users/ed/tmp/ksocket-ed/klauncherTi6038.slave-socket=20 local:/home/users/ed/tmp/ksocket-ed/dolphinU11997.slave-socket So far I've learned, that I can do such combination: first, I can umoun= t=20 this share with -l (lazy) option, but the process in question still exi= sts.=20 Second, I can turn the NAS off, wait for a moment and turn it on again = (I'm=20 not 100% sure if the restart of NAS is a must here, but it is working) = and=20 reload the cifs.ko module. As a result, the process is gone and I can k= eep=20 on working. Till the problem occurs again... I'm using PLD Linux (which is probably not important). I have a vanilla= =20 kernel, right now it is 3.2.2 but the same happened since 2.6.x (the on= ly=20 improve after changing to 3.2. is a big performance jump). I have cifs- utils-5.2 installed and I'm loading the cifs.ko module with the followi= ng=20 parameters: echo_retries=3D1 cifs_max_pending=3D2 cifs_max_pending=3D2 is the most important, the higher the value, the m= ore=20 often the problem occurs and 2 is the smallest possible. Is there anything I can do in the side of my Linux box in such situatio= n? I=20 cannot upgrade the NAS firmware for I have the latest version and proba= bly=20 no newer will be released (it is closed-source). I cannot get rid of th= is=20 NAS either. At least for some time. The best would be of course to make= cifs=20 work with my NAS anyway, but it's up to You, for I have not enough know= ledge=20 about it. --=20 =A3ukasz Ma=B6ko = _o) Lukasz.Masko(at)ipipan.waw.pl = /\\ Registered Linux User #61028 = _\_V