* CIFS hang [not found] ` <56C9EF36.7010301-hi6Y0CQ0nG0@public.gmane.org> @ 2016-02-21 19:07 ` Markus Greger [not found] ` <56CA0ADB.5070303-hi6Y0CQ0nG0@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Markus Greger @ 2016-02-21 19:07 UTC (permalink / raw) To: linux-cifs-u79uwXL29TY76Z2rM5mHXA Hi, I've mounted two nas boxes via cifs on a 3.18.25 kernel client. After some time these get unavailable from the client and this impacts the system greatly (for example dialogs to save files hang as well). The hang is not limited to 60s or 300s but seems to hang infinitely (already > 100 minutes). Here is some more information I can provide: * mount options noauto,user,soft,iocharset=utf8,cache=none,username=Name,sec=ntlmv2 * ps for hanging process ps -lp 20510 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 0 D 0 20510 20386 0 80 0 - 4936 cifs_r pts/7 00:00:00 ls * cat /proc/fs/cifs/DebugData cat /proc/fs/cifs/DebugData Display Internal CIFS Data Structures for Debugging --------------------------------------------------- CIFS Version 2.05 Features: dfs lanman posix spnego xattr acl Active VFS Requests: 4 Servers: 1) Name: 1.1.1.22 Domain: STORM Uses: 1 OS: Unix NOS: Samba 3.0.32 Capability: 0x80f3fd SMB session status: 1 TCP status: 4 Local Users To Server: 1 SecMode: 0x3 Req On Wire: 0 In Send: 0 In MaxReq Wait: 0 Shares: 1) \\nas-box2\share Mounts: 1 Type: NTFS DevInfo: 0x0 Attributes: 0x2f PathComponentMax: 255 Status: 1 type: 0 DISCONNECTED MIDs: 2) Name: 1.1.1.120 Domain: STORM Uses: 2 OS: Unix NOS: Samba 4.1.18-3.33.2-3407-SUSE-oS13.1-x86_64 Capability: 0x80f3fd SMB session status: 1 TCP status: 1 Local Users To Server: 1 SecMode: 0x3 Req On Wire: 0 In Send: 0 In MaxReq Wait: 0 Shares: 1) \\server\share1 Mounts: 1 Type: NTFS DevInfo: 0x20 Attributes: 0x1002f PathComponentMax: 255 Status: 1 type: DISK DISCONNECTED 2) \\server\share2 Mounts: 1 Type: NTFS DevInfo: 0x20 Attributes: 0x1002f PathComponentMax: 255 Status: 1 type: DISK MIDs: 3) Name: 1.1.1.25 Domain: STORM Uses: 3 OS: Unix NOS: Samba 3.0.32 Capability: 0x80f3fd SMB session status: 1 TCP status: 3 Local Users To Server: 1 SecMode: 0x3 Req On Wire: 1 In Send: 0 In MaxReq Wait: 0 Shares: 1) \\nas-box1\share1 Mounts: 1 Type: NTFS DevInfo: 0x0 Attributes: 0x2f PathComponentMax: 255 Status: 1 type: 0 DISCONNECTED 2) \\nas-box1\share2 Mounts: 1 Type: NTFS DevInfo: 0x0 Attributes: 0x2f PathComponentMax: 255 Status: 1 type: 0 DISCONNECTED 3) \\nas-box1\share3 Mounts: 1 Type: NTFS DevInfo: 0x0 Attributes: 0x2f PathComponentMax: 255 Status: 1 type: 0 DISCONNECTED MIDs: * messages 2016-02-21T16:49:21.432423+01:00 client kernel: [549835.920336] INFO: task ls:20510 blocked for more than 10 seconds. 2016-02-21T16:49:21.432424+01:00 client kernel: [549835.920337] Tainted: G W 3.18.25-desktop #4 2016-02-21T16:49:21.432425+01:00 client kernel: [549835.920338] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2016-02-21T16:49:21.432426+01:00 client kernel: [549835.920340] ls D ffff8801bfc12f00 0 20510 20386 0x00000000 2016-02-21T16:49:21.432427+01:00 client kernel: [549835.920342] ffff88012f48fa98 0000000000000086 ffff8801b4f24150 ffff88012f48ffd8 2016-02-21T16:49:21.432428+01:00 client kernel: [549835.920344] 0000000000012f00 0000000000012f00 ffff88015ced2710 ffff8801b4f24150 2016-02-21T16:49:21.432429+01:00 client kernel: [549835.920346] 0000000000000000 ffff88005d65ca20 ffff88005d65ca24 ffff8801b4f24150 2016-02-21T16:49:21.432430+01:00 client kernel: [549835.920349] Call Trace: 2016-02-21T16:49:21.432431+01:00 client kernel: [549835.920352] [<ffffffff816ae43c>] schedule_preempt_disabled+0x2c/0x80 2016-02-21T16:49:21.432433+01:00 client kernel: [549835.920355] [<ffffffff816afd55>] __mutex_lock_slowpath+0xc5/0x130 2016-02-21T16:49:21.432434+01:00 client kernel: [549835.920357] [<ffffffff816afdd6>] mutex_lock+0x16/0x2a 2016-02-21T16:49:21.432435+01:00 client kernel: [549835.920362] [<ffffffffc0cb3feb>] cifs_reconnect_tcon+0x15b/0x2e0 [cifs] 2016-02-21T16:49:21.432436+01:00 client kernel: [549835.920366] [<ffffffff81093f45>] ? set_next_entity+0x95/0xb0 2016-02-21T16:49:21.432437+01:00 client kernel: [549835.920370] [<ffffffffc0cb4219>] smb_init+0x29/0x50 [cifs] 2016-02-21T16:49:21.432438+01:00 client kernel: [549835.920375] [<ffffffffc0cba50a>] CIFSSMBUnixQPathInfo+0x6a/0x2b0 [cifs] 2016-02-21T16:49:21.432439+01:00 client kernel: [549835.920382] [<ffffffffc0ccfe47>] cifs_get_inode_info_unix+0x77/0x1c0 [cifs] 2016-02-21T16:49:21.432440+01:00 client kernel: [549835.920385] [<ffffffff811b52f6>] ? path_lookupat+0x66/0x740 2016-02-21T16:49:21.432441+01:00 client kernel: [549835.920391] [<ffffffffc0cc53e1>] ? build_path_from_dentry+0xb1/0x2b0 [cifs] 2016-02-21T16:49:21.432442+01:00 client kernel: [549835.920397] [<ffffffffc0cc5473>] ? build_path_from_dentry+0x143/0x2b0 [cifs] 2016-02-21T16:49:21.432443+01:00 client kernel: [549835.920404] [<ffffffffc0cd21c8>] cifs_revalidate_dentry_attr+0xa8/0x1a0 [cifs] 2016-02-21T16:49:21.432444+01:00 client kernel: [549835.920411] [<ffffffffc0cd2372>] cifs_getattr+0x52/0x130 [cifs] 2016-02-21T16:49:21.432445+01:00 client kernel: [549835.920414] [<ffffffff810586ac>] ? __do_page_fault+0x22c/0x580 2016-02-21T16:49:21.432446+01:00 client kernel: [549835.920416] [<ffffffff811ad5b7>] vfs_getattr_nosec+0x27/0x40 2016-02-21T16:49:21.432447+01:00 client kernel: [549835.920419] [<ffffffff811ad668>] vfs_getattr+0x28/0x30 2016-02-21T16:49:21.432448+01:00 client kernel: [549835.920421] [<ffffffff811ad72d>] vfs_fstatat+0x5d/0xa0 2016-02-21T16:49:21.432449+01:00 client kernel: [549835.920424] [<ffffffff811adb62>] SYSC_newlstat+0x22/0x40 2016-02-21T16:49:21.432450+01:00 client kernel: [549835.920426] [<ffffffff81058a22>] ? do_page_fault+0x22/0x30 2016-02-21T16:49:21.432451+01:00 client kernel: [549835.920429] [<ffffffff816b4398>] ? page_fault+0x28/0x30 2016-02-21T16:49:21.432452+01:00 client kernel: [549835.920432] [<ffffffff811add69>] SyS_newlstat+0x9/0x10 2016-02-21T16:49:21.432453+01:00 client kernel: [549835.920434] [<ffffffff816b234d>] system_call_fastpath+0x16/0x1b * wireshark didn't show any traffic from client to nas-box1 - only some membership broadcasts, name queries and other broadcast messages. Specifically there was no SMB (ECHO) message from the client to the nas-box1. My questions are: * Why did the "soft" option not result in my processes (like ls) returning errors? (strace on ls showed nothing at all, kill -9 won't work) * What could cause this state? I've got the feeling dns sometimes "forgets" about these boxes, at least nslookup won't return any ip. The boxes are ok however and can be pinged successfully, too. * Is it possible to reactivate these "dead" connections, or do I have to umount them (e.g. via umount -a -f -t cifs) and then remount? Thanks, Markus ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <56CA0ADB.5070303-hi6Y0CQ0nG0@public.gmane.org>]
* Re: CIFS hang [not found] ` <56CA0ADB.5070303-hi6Y0CQ0nG0@public.gmane.org> @ 2016-02-22 4:36 ` Shirish Pargaonkar [not found] ` <CAH2r5mu1Gaap826QGg+b-La6Nzz9We_Gb4rKJ3XqNQ4yHrEXUA@mail.gmail.com> 0 siblings, 1 reply; 11+ messages in thread From: Shirish Pargaonkar @ 2016-02-22 4:36 UTC (permalink / raw) To: Markus Greger; +Cc: linux-cifs There may be some kind of deadlock, that is why soft mount option is not responding, otherwise you would seen something like "host is down" All of the shares show DISCONNECTED in file DebugData. Can you use crash utility and see the stack trace of the ls command that hangs? What if you mounted from just one share/server (1.1.1.22 e.g.), do you see the same problem? On Sun, Feb 21, 2016 at 1:07 PM, Markus Greger <Markus.Greger-hi6Y0CQ0nG0@public.gmane.org> wrote: > Hi, > > I've mounted two nas boxes via cifs on a 3.18.25 kernel client. After > some time these get unavailable from the client and this impacts the > system greatly (for example dialogs to save files hang as well). The > hang is not limited to 60s or 300s but seems to hang infinitely (already >> 100 minutes). > > Here is some more information I can provide: > > * mount options > > noauto,user,soft,iocharset=utf8,cache=none,username=Name,sec=ntlmv2 > > * ps for hanging process > > ps -lp 20510 > F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD > 0 D 0 20510 20386 0 80 0 - 4936 cifs_r pts/7 00:00:00 ls > > * cat /proc/fs/cifs/DebugData > > cat /proc/fs/cifs/DebugData > Display Internal CIFS Data Structures for Debugging > --------------------------------------------------- > CIFS Version 2.05 > Features: dfs lanman posix spnego xattr acl > Active VFS Requests: 4 > Servers: > 1) Name: 1.1.1.22 Domain: STORM Uses: 1 OS: Unix > NOS: Samba 3.0.32 Capability: 0x80f3fd > SMB session status: 1 TCP status: 4 > Local Users To Server: 1 SecMode: 0x3 Req On Wire: 0 In > Send: 0 In MaxReq Wait: 0 > Shares: > 1) \\nas-box2\share Mounts: 1 Type: NTFS DevInfo: 0x0 > Attributes: 0x2f > PathComponentMax: 255 Status: 1 type: 0 DISCONNECTED > > MIDs: > > 2) Name: 1.1.1.120 Domain: STORM Uses: 2 OS: Unix > NOS: Samba 4.1.18-3.33.2-3407-SUSE-oS13.1-x86_64 > Capability: 0x80f3fd > SMB session status: 1 TCP status: 1 > Local Users To Server: 1 SecMode: 0x3 Req On Wire: 0 In > Send: 0 In MaxReq Wait: 0 > Shares: > 1) \\server\share1 Mounts: 1 Type: NTFS DevInfo: 0x20 > Attributes: 0x1002f > PathComponentMax: 255 Status: 1 type: DISK DISCONNECTED > > 2) \\server\share2 Mounts: 1 Type: NTFS DevInfo: 0x20 > Attributes: 0x1002f > PathComponentMax: 255 Status: 1 type: DISK > > MIDs: > > 3) Name: 1.1.1.25 Domain: STORM Uses: 3 OS: Unix > NOS: Samba 3.0.32 Capability: 0x80f3fd > SMB session status: 1 TCP status: 3 > Local Users To Server: 1 SecMode: 0x3 Req On Wire: 1 In > Send: 0 In MaxReq Wait: 0 > Shares: > 1) \\nas-box1\share1 Mounts: 1 Type: NTFS DevInfo: 0x0 > Attributes: 0x2f > PathComponentMax: 255 Status: 1 type: 0 DISCONNECTED > > 2) \\nas-box1\share2 Mounts: 1 Type: NTFS DevInfo: 0x0 > Attributes: 0x2f > PathComponentMax: 255 Status: 1 type: 0 DISCONNECTED > > 3) \\nas-box1\share3 Mounts: 1 Type: NTFS DevInfo: 0x0 > Attributes: 0x2f > PathComponentMax: 255 Status: 1 type: 0 DISCONNECTED > > MIDs: > > * messages > > 2016-02-21T16:49:21.432423+01:00 client kernel: [549835.920336] > INFO: task ls:20510 blocked for more than 10 seconds. > 2016-02-21T16:49:21.432424+01:00 client kernel: > [549835.920337] Tainted: G W 3.18.25-desktop #4 > 2016-02-21T16:49:21.432425+01:00 client kernel: [549835.920338] > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > 2016-02-21T16:49:21.432426+01:00 client kernel: [549835.920340] > ls D ffff8801bfc12f00 0 20510 20386 0x00000000 > 2016-02-21T16:49:21.432427+01:00 client kernel: [549835.920342] > ffff88012f48fa98 0000000000000086 ffff8801b4f24150 ffff88012f48ffd8 > 2016-02-21T16:49:21.432428+01:00 client kernel: [549835.920344] > 0000000000012f00 0000000000012f00 ffff88015ced2710 ffff8801b4f24150 > 2016-02-21T16:49:21.432429+01:00 client kernel: [549835.920346] > 0000000000000000 ffff88005d65ca20 ffff88005d65ca24 ffff8801b4f24150 > 2016-02-21T16:49:21.432430+01:00 client kernel: [549835.920349] Call > Trace: > 2016-02-21T16:49:21.432431+01:00 client kernel: [549835.920352] > [<ffffffff816ae43c>] schedule_preempt_disabled+0x2c/0x80 > 2016-02-21T16:49:21.432433+01:00 client kernel: [549835.920355] > [<ffffffff816afd55>] __mutex_lock_slowpath+0xc5/0x130 > 2016-02-21T16:49:21.432434+01:00 client kernel: [549835.920357] > [<ffffffff816afdd6>] mutex_lock+0x16/0x2a > 2016-02-21T16:49:21.432435+01:00 client kernel: [549835.920362] > [<ffffffffc0cb3feb>] cifs_reconnect_tcon+0x15b/0x2e0 [cifs] > 2016-02-21T16:49:21.432436+01:00 client kernel: [549835.920366] > [<ffffffff81093f45>] ? set_next_entity+0x95/0xb0 > 2016-02-21T16:49:21.432437+01:00 client kernel: [549835.920370] > [<ffffffffc0cb4219>] smb_init+0x29/0x50 [cifs] > 2016-02-21T16:49:21.432438+01:00 client kernel: [549835.920375] > [<ffffffffc0cba50a>] CIFSSMBUnixQPathInfo+0x6a/0x2b0 [cifs] > 2016-02-21T16:49:21.432439+01:00 client kernel: [549835.920382] > [<ffffffffc0ccfe47>] cifs_get_inode_info_unix+0x77/0x1c0 [cifs] > 2016-02-21T16:49:21.432440+01:00 client kernel: [549835.920385] > [<ffffffff811b52f6>] ? path_lookupat+0x66/0x740 > 2016-02-21T16:49:21.432441+01:00 client kernel: [549835.920391] > [<ffffffffc0cc53e1>] ? build_path_from_dentry+0xb1/0x2b0 [cifs] > 2016-02-21T16:49:21.432442+01:00 client kernel: [549835.920397] > [<ffffffffc0cc5473>] ? build_path_from_dentry+0x143/0x2b0 [cifs] > 2016-02-21T16:49:21.432443+01:00 client kernel: [549835.920404] > [<ffffffffc0cd21c8>] cifs_revalidate_dentry_attr+0xa8/0x1a0 [cifs] > 2016-02-21T16:49:21.432444+01:00 client kernel: [549835.920411] > [<ffffffffc0cd2372>] cifs_getattr+0x52/0x130 [cifs] > 2016-02-21T16:49:21.432445+01:00 client kernel: [549835.920414] > [<ffffffff810586ac>] ? __do_page_fault+0x22c/0x580 > 2016-02-21T16:49:21.432446+01:00 client kernel: [549835.920416] > [<ffffffff811ad5b7>] vfs_getattr_nosec+0x27/0x40 > 2016-02-21T16:49:21.432447+01:00 client kernel: [549835.920419] > [<ffffffff811ad668>] vfs_getattr+0x28/0x30 > 2016-02-21T16:49:21.432448+01:00 client kernel: [549835.920421] > [<ffffffff811ad72d>] vfs_fstatat+0x5d/0xa0 > 2016-02-21T16:49:21.432449+01:00 client kernel: [549835.920424] > [<ffffffff811adb62>] SYSC_newlstat+0x22/0x40 > 2016-02-21T16:49:21.432450+01:00 client kernel: [549835.920426] > [<ffffffff81058a22>] ? do_page_fault+0x22/0x30 > 2016-02-21T16:49:21.432451+01:00 client kernel: [549835.920429] > [<ffffffff816b4398>] ? page_fault+0x28/0x30 > 2016-02-21T16:49:21.432452+01:00 client kernel: [549835.920432] > [<ffffffff811add69>] SyS_newlstat+0x9/0x10 > 2016-02-21T16:49:21.432453+01:00 client kernel: [549835.920434] > [<ffffffff816b234d>] system_call_fastpath+0x16/0x1b > > * wireshark didn't show any traffic from client to nas-box1 - only > some membership broadcasts, name queries and other broadcast > messages. Specifically there was no SMB (ECHO) message from the > client to the nas-box1. > > My questions are: > > * Why did the "soft" option not result in my processes (like ls) > returning errors? (strace on ls showed nothing at all, kill -9 won't > work) > * What could cause this state? > I've got the feeling dns sometimes "forgets" about these boxes, at > least nslookup won't return any ip. The boxes are ok however and can > be pinged successfully, too. > * Is it possible to reactivate these "dead" connections, or do I have > to umount them (e.g. via umount -a -f -t cifs) and then remount? > > Thanks, > > Markus > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CAH2r5mu1Gaap826QGg+b-La6Nzz9We_Gb4rKJ3XqNQ4yHrEXUA@mail.gmail.com>]
[parent not found: <CAH2r5mu1Gaap826QGg+b-La6Nzz9We_Gb4rKJ3XqNQ4yHrEXUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: CIFS hang [not found] ` <CAH2r5mu1Gaap826QGg+b-La6Nzz9We_Gb4rKJ3XqNQ4yHrEXUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-07-19 9:57 ` Markus Greger 0 siblings, 0 replies; 11+ messages in thread From: Markus Greger @ 2016-07-19 9:57 UTC (permalink / raw) To: linux-cifs-u79uwXL29TY76Z2rM5mHXA Hello, I had to reboot the machine and ever since it has not reoccured although it had happend a number of times before. Wonder if it might be related to powerlan for I had to reboot these as well. Anyway just for completeness sake here are the answers: Am 22.02.2016 um 05:54 schrieb Steve French: > > Also would be useful to know if this fails on more recent kernel > I'm using 3.18.25 as this was (at that time) a longterm kernel version. I've recently upgraded to 3.18.36 (currently one of the longterm kernels)? A newer kernel (4.6.3) had some graphics issues so it's not usable for me. > On Feb 21, 2016 22:37, "Shirish Pargaonkar" > <shirishpargaonkar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <mailto:shirishpargaonkar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>> wrote: > > There may be some kind of deadlock, that is why soft mount option is > not responding, > otherwise you would seen something like "host is down" > All of the shares show DISCONNECTED in file DebugData. > Can you use crash utility and see the stack trace of the ls > command that hangs? > What if you mounted from just one share/server (1.1.1.22 e.g.), do you > see the same problem? > I don't have crash installed and am somewhat reluctant as it has quite a footprint. I assumed the call trace in messages to be a stack trace. Can I get more useful information by attaching with gdb? What should I be looking for? I usually work with both of them. As the hang only occurs after some time I can try to only mount the second nas when needing it and unmounting it later. Why do you think it might have an impact? Thanks a lot for your answers and I'm happy it has vanished at least for quite some time now. Regards, Markus ^ permalink raw reply [flat|nested] 11+ messages in thread
* CIFS hang
@ 2016-03-27 1:21 Dāvis Mosāns
[not found] ` <CAOE4rSwtpUqzLUkUytNBJgTfe5kSKqZeKFgrhAP_TD4ms3wyVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 11+ messages in thread
From: Dāvis Mosāns @ 2016-03-27 1:21 UTC (permalink / raw)
To: linux-cifs-u79uwXL29TY76Z2rM5mHXA
2016-02-21 21:07:07 GMT+02:00 Markus Greger:
> Hi,
>
> I've mounted two nas boxes via cifs on a 3.18.25 kernel client. After
> some time these get unavailable from the client and this impacts the
> system greatly (for example dialogs to save files hang as well). The
> hang is not limited to 60s or 300s but seems to hang infinitely (already
>> 100 minutes).
I'm also getting CIFS hung, but on Arch Linux with 4.5.0 kernel and
cifs-utils 6.4
I'm mounting \\192.168.1.2\Data$ (which is a share on Windows 10) on
/mnt/Data with options
credentials=/etc/samba/credentials,iocharset=utf8,vers=3.0,uid=user,gid=group,file_mode=0770,dir_mode=0770,noauto
$ cat /proc/fs/cifs/DebugData
Display Internal CIFS Data Structures for Debugging
---------------------------------------------------
CIFS Version 2.08
Features: dfs fscache lanman posix spnego xattr acl
Active VFS Requests: 0
Servers:
1) entry for 192.168.1.2 not fully displayed
TCP status: 1
Local Users To Server: 1 SecMode: 0x1 Req On Wire: 0
Shares:
1) \\192.168.1.2\Data$ Mounts: 1 DevInfo: 0x60020 Attributes: 0xc700ff
PathComponentMax: 255 Status: 1 type: DISK
Share Capabilities: None Aligned, Partition Aligned, Share
Flags: 0x0 Optimal sector size: 0x1000
MIDs:
when executing (or any application which tries to access /mnt)
$ ls /mnt
hungs and can't be stopped even with ^C
[47525.047817] INFO: task ls:11878 blocked for more than 120 seconds.
[47525.047819] Tainted: P O 4.5.0-ARCH-dirty #1
[47525.047820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[47525.047822] ls D ffff88029477f968 0 11878 1 0x00000004
[47525.047825] ffff88029477f968 0000000000000000 ffff880386e8aac0
ffff88029f8cc740
[47525.047828] ffff880294780000 ffff88020d7e9424 ffff88029f8cc740
00000000ffffffff
[47525.047831] ffff88020d7e9428 ffff88029477f980 ffffffff815947ac
ffff88020d7e9420
[47525.047834] Call Trace:
[47525.047837] [<ffffffff815947ac>] schedule+0x3c/0x90
[47525.047840] [<ffffffff81594b85>] schedule_preempt_disabled+0x15/0x20
[47525.047842] [<ffffffff8159603e>] __mutex_lock_slowpath+0xce/0x140
[47525.047845] [<ffffffff815960c7>] mutex_lock+0x17/0x30
[47525.047852] [<ffffffffa144684c>] small_smb2_init+0x18c/0x3f0 [cifs]
[47525.047855] [<ffffffff811c50ee>] ? kmem_cache_alloc_trace+0x1de/0x200
[47525.047862] [<ffffffffa14479b9>] SMB2_open+0x79/0x8f0 [cifs]
[47525.047870] [<ffffffffa1439e56>] ? cifsConvertToUTF16+0x156/0x2f0 [cifs]
[47525.047878] [<ffffffffa143a0b1>] ? cifs_strndup_to_utf16+0xc1/0x110 [cifs]
[47525.047884] [<ffffffffa1449ebd>] smb2_open_op_close+0xad/0x1e0 [cifs]
[47525.047887] [<ffffffff811b9a0c>] ? alloc_pages_current+0x8c/0x110
[47525.047890] [<ffffffff8116a209>] ? alloc_kmem_pages+0x19/0x90
[47525.047893] [<ffffffff8118a16e>] ? kmalloc_order_trace+0x2e/0x100
[47525.047899] [<ffffffffa144a0f5>] smb2_query_path_info+0x85/0x180 [cifs]
[47525.047907] [<ffffffffa1432f98>] cifs_get_inode_info+0x368/0x660 [cifs]
[47525.047910] [<ffffffff811f6504>] ? putname+0x54/0x60
[47525.047913] [<ffffffff811c48be>] ? __kmalloc+0x2e/0x250
[47525.047915] [<ffffffff811f68e6>] ? filename_lookup+0xc6/0x140
[47525.047923] [<ffffffffa1429976>] ? build_path_from_dentry+0xb6/0x210 [cifs]
[47525.047930] [<ffffffffa14299e9>] ? build_path_from_dentry+0x129/0x210 [cifs]
[47525.047938] [<ffffffffa14349aa>]
cifs_revalidate_dentry_attr+0xda/0xf0 [cifs]
[47525.047945] [<ffffffffa1434a71>] cifs_getattr+0x51/0x110 [cifs]
[47525.047948] [<ffffffff811ec2d9>] vfs_getattr_nosec+0x29/0x40
[47525.047951] [<ffffffff811ec4f6>] vfs_getattr+0x26/0x30
[47525.047953] [<ffffffff811ec5d8>] vfs_fstatat+0x78/0xc0
[47525.047956] [<ffffffff811ecb26>] SyS_newlstat+0x36/0x70
[47525.047959] [<ffffffff8159832e>] entry_SYSCALL_64_fastpath+0x12/0x6d
Note that this happens only sometimes and seems it's related to Samba
because as soon as I stop smbd then it unhungs and I get message about host
down but then when I try "ls" again it works and successfully shows share's
files/folders.
Also I don't know who hangs first, but in same time Windows 10 is also basically
unusable because a lot of things/processes hangs/crashes/stop working like
explorer.exe, task manger, PowerShell and all other which try to access this
Arch Linux shares. But that happens only when shares are accessed by computer
name and when IP address is used directly then it works fine. Also as soon as I
stop smbd then Windows starts to respond again and everything works. And then
after starting smbd again all shares start to working. But one day that didn't
solved it and even after smbd restart Windows explorer still hung when tried to
access Arch Linux shares by computer name and only rebooting Linux fixed it.
It really happens randomly and not often so I've no idea what causes it
and I don't know what/who hangs who.
But I see some possible bugs here, CIFS shouldn't hang even if
remote host isn't responding/have hung and same for Windows it shouldn't
wait forever on shares as it freezes basically all applications which are
accessing them. Also there might be some Samba bug too...
^ permalink raw reply [flat|nested] 11+ messages in thread[parent not found: <CAOE4rSwtpUqzLUkUytNBJgTfe5kSKqZeKFgrhAP_TD4ms3wyVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: CIFS hang [not found] ` <CAOE4rSwtpUqzLUkUytNBJgTfe5kSKqZeKFgrhAP_TD4ms3wyVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-03-27 1:30 ` Steve French [not found] ` <CAH2r5mshFG2FGaRhXnQ4nVKGd_V+1-eQj1K1JPMBBptdvOJL1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-03-27 5:11 ` Shirish Pargaonkar 1 sibling, 1 reply; 11+ messages in thread From: Steve French @ 2016-03-27 1:30 UTC (permalink / raw) To: Dāvis Mosāns; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org It sounds like you have three machine scenario involving Windows, cifs client and Samba server. Can you describe who is the client to what and what the relationship is between Samba server and Windows in your scenario. Are you using default mount options? It is possible we have a bug here, but it is a little confusing why we have a getattr calling open - since these should be handle based calls, the file would already be open, unless we are in a reconnect scenario where the server or network had gone down On Sat, Mar 26, 2016 at 8:21 PM, Dāvis Mosāns <davispuh@gmail.com> wrote: > 2016-02-21 21:07:07 GMT+02:00 Markus Greger: >> Hi, >> >> I've mounted two nas boxes via cifs on a 3.18.25 kernel client. After >> some time these get unavailable from the client and this impacts the >> system greatly (for example dialogs to save files hang as well). The >> hang is not limited to 60s or 300s but seems to hang infinitely (already >>> 100 minutes). > > I'm also getting CIFS hung, but on Arch Linux with 4.5.0 kernel and > cifs-utils 6.4 > > I'm mounting \\192.168.1.2\Data$ (which is a share on Windows 10) on > /mnt/Data with options > > credentials=/etc/samba/credentials,iocharset=utf8,vers=3.0,uid=user,gid=group,file_mode=0770,dir_mode=0770,noauto > > $ cat /proc/fs/cifs/DebugData > Display Internal CIFS Data Structures for Debugging > --------------------------------------------------- > CIFS Version 2.08 > Features: dfs fscache lanman posix spnego xattr acl > Active VFS Requests: 0 > Servers: > 1) entry for 192.168.1.2 not fully displayed > TCP status: 1 > Local Users To Server: 1 SecMode: 0x1 Req On Wire: 0 > Shares: > 1) \\192.168.1.2\Data$ Mounts: 1 DevInfo: 0x60020 Attributes: 0xc700ff > PathComponentMax: 255 Status: 1 type: DISK > Share Capabilities: None Aligned, Partition Aligned, Share > Flags: 0x0 Optimal sector size: 0x1000 > MIDs: > > when executing (or any application which tries to access /mnt) > > $ ls /mnt > > hungs and can't be stopped even with ^C > > [47525.047817] INFO: task ls:11878 blocked for more than 120 seconds. > [47525.047819] Tainted: P O 4.5.0-ARCH-dirty #1 > [47525.047820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [47525.047822] ls D ffff88029477f968 0 11878 1 0x00000004 > [47525.047825] ffff88029477f968 0000000000000000 ffff880386e8aac0 > ffff88029f8cc740 > [47525.047828] ffff880294780000 ffff88020d7e9424 ffff88029f8cc740 > 00000000ffffffff > [47525.047831] ffff88020d7e9428 ffff88029477f980 ffffffff815947ac > ffff88020d7e9420 > [47525.047834] Call Trace: > [47525.047837] [<ffffffff815947ac>] schedule+0x3c/0x90 > [47525.047840] [<ffffffff81594b85>] schedule_preempt_disabled+0x15/0x20 > [47525.047842] [<ffffffff8159603e>] __mutex_lock_slowpath+0xce/0x140 > [47525.047845] [<ffffffff815960c7>] mutex_lock+0x17/0x30 > [47525.047852] [<ffffffffa144684c>] small_smb2_init+0x18c/0x3f0 [cifs] > [47525.047855] [<ffffffff811c50ee>] ? kmem_cache_alloc_trace+0x1de/0x200 > [47525.047862] [<ffffffffa14479b9>] SMB2_open+0x79/0x8f0 [cifs] > [47525.047870] [<ffffffffa1439e56>] ? cifsConvertToUTF16+0x156/0x2f0 [cifs] > [47525.047878] [<ffffffffa143a0b1>] ? cifs_strndup_to_utf16+0xc1/0x110 [cifs] > [47525.047884] [<ffffffffa1449ebd>] smb2_open_op_close+0xad/0x1e0 [cifs] > [47525.047887] [<ffffffff811b9a0c>] ? alloc_pages_current+0x8c/0x110 > [47525.047890] [<ffffffff8116a209>] ? alloc_kmem_pages+0x19/0x90 > [47525.047893] [<ffffffff8118a16e>] ? kmalloc_order_trace+0x2e/0x100 > [47525.047899] [<ffffffffa144a0f5>] smb2_query_path_info+0x85/0x180 [cifs] > [47525.047907] [<ffffffffa1432f98>] cifs_get_inode_info+0x368/0x660 [cifs] > [47525.047910] [<ffffffff811f6504>] ? putname+0x54/0x60 > [47525.047913] [<ffffffff811c48be>] ? __kmalloc+0x2e/0x250 > [47525.047915] [<ffffffff811f68e6>] ? filename_lookup+0xc6/0x140 > [47525.047923] [<ffffffffa1429976>] ? build_path_from_dentry+0xb6/0x210 [cifs] > [47525.047930] [<ffffffffa14299e9>] ? build_path_from_dentry+0x129/0x210 [cifs] > [47525.047938] [<ffffffffa14349aa>] > cifs_revalidate_dentry_attr+0xda/0xf0 [cifs] > [47525.047945] [<ffffffffa1434a71>] cifs_getattr+0x51/0x110 [cifs] > [47525.047948] [<ffffffff811ec2d9>] vfs_getattr_nosec+0x29/0x40 > [47525.047951] [<ffffffff811ec4f6>] vfs_getattr+0x26/0x30 > [47525.047953] [<ffffffff811ec5d8>] vfs_fstatat+0x78/0xc0 > [47525.047956] [<ffffffff811ecb26>] SyS_newlstat+0x36/0x70 > [47525.047959] [<ffffffff8159832e>] entry_SYSCALL_64_fastpath+0x12/0x6d > > Note that this happens only sometimes and seems it's related to Samba > because as soon as I stop smbd then it unhungs and I get message about host > down but then when I try "ls" again it works and successfully shows share's > files/folders. > > Also I don't know who hangs first, but in same time Windows 10 is also basically > unusable because a lot of things/processes hangs/crashes/stop working like > explorer.exe, task manger, PowerShell and all other which try to access this > Arch Linux shares. But that happens only when shares are accessed by computer > name and when IP address is used directly then it works fine. Also as soon as I > stop smbd then Windows starts to respond again and everything works. And then > after starting smbd again all shares start to working. But one day that didn't > solved it and even after smbd restart Windows explorer still hung when tried to > access Arch Linux shares by computer name and only rebooting Linux fixed it. > > It really happens randomly and not often so I've no idea what causes it > and I don't know what/who hangs who. > But I see some possible bugs here, CIFS shouldn't hang even if > remote host isn't responding/have hung and same for Windows it shouldn't > wait forever on shares as it freezes basically all applications which are > accessing them. Also there might be some Samba bug too... > -- > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thanks, Steve ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CAH2r5mshFG2FGaRhXnQ4nVKGd_V+1-eQj1K1JPMBBptdvOJL1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: CIFS hang [not found] ` <CAH2r5mshFG2FGaRhXnQ4nVKGd_V+1-eQj1K1JPMBBptdvOJL1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-03-27 2:03 ` Dāvis Mosāns [not found] ` <CAH2r5mtPHB=c-9Ds5JV2gXPLawS3HR+FMYDP1bzE=8otgkhEFw@mail.gmail.com> 0 siblings, 1 reply; 11+ messages in thread From: Dāvis Mosāns @ 2016-03-27 2:03 UTC (permalink / raw) To: Steve French; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org 2016-03-27 4:30 GMT+03:00 Steve French <smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>: > It sounds like you have three machine scenario involving Windows, cifs > client and Samba server. Can you describe who is the client to what > and what the relationship is between Samba server and Windows in your > scenario. > > Are you using default mount options? > There's 2 machines, Windows 10 (192.168.1.2 and 192.168.1.3 it have 2 Ethernet ports) with share Data$ And Arch Linux which mounts that share and who also have nmbd, winbindd and smbd running with several shares [global] server string = Server workgroup = WORK domain master = Yes preferred master = Yes log file = /var/log/samba/%m.log max log size = 2000 printcap name = /etc/printcap name resolve order = host lmhosts wins bcast time server = Yes unix extensions = No security = USER dns proxy = No idmap config * : backend = tdb store dos attributes = Yes map acl inherit = Yes hosts allow = 192.168.0. 192.168.1. 127. inherit acls = Yes inherit permissions = Yes vfs objects = streams_xattr [Share$] comment = Share path = /mnt/Share wide links = Yes create mask = 0760 directory mask = 0750 read only = No valid users = @Share and so on similar shares. In Windows 10 then these shares are mapped as drive letters. So basically both PCs does access and use each other shares in same time. There actually are other Windows machines too which also use these shares and that Windows 10 share but I don't consider them important for this issue. Another backtrace is a bit different, but still looks about same: [47525.047644] INFO: task ls:11662 blocked for more than 120 seconds. [47525.047649] Tainted: P O 4.5.0-ARCH-dirty #1 [47525.047651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [47525.047653] ls D ffff8803d0cdb968 0 11662 8840 0x00000004 [47525.047658] ffff8803d0cdb968 0000000000000000 ffff880612f2d580 ffff88029f8c8000 [47525.047662] ffff8803d0cdc000 ffff88020d7e9424 ffff88029f8c8000 00000000ffffffff [47525.047664] ffff88020d7e9428 ffff8803d0cdb980 ffffffff815947ac ffff88020d7e9420 [47525.047667] Call Trace: [47525.047675] [<ffffffff815947ac>] schedule+0x3c/0x90 [47525.047678] [<ffffffff81594b85>] schedule_preempt_disabled+0x15/0x20 [47525.047681] [<ffffffff8159603e>] __mutex_lock_slowpath+0xce/0x140 [47525.047684] [<ffffffff815960c7>] mutex_lock+0x17/0x30 [47525.047695] [<ffffffffa144684c>] small_smb2_init+0x18c/0x3f0 [cifs] [47525.047700] [<ffffffff811c50ee>] ? kmem_cache_alloc_trace+0x1de/0x200 [47525.047707] [<ffffffffa14479b9>] SMB2_open+0x79/0x8f0 [cifs] [47525.047714] [<ffffffffa1439e56>] ? cifsConvertToUTF16+0x156/0x2f0 [cifs] [47525.047723] [<ffffffffa143a0b1>] ? cifs_strndup_to_utf16+0xc1/0x110 [cifs] [47525.047729] [<ffffffffa1449ebd>] smb2_open_op_close+0xad/0x1e0 [cifs] [47525.047732] [<ffffffff811b9a0c>] ? alloc_pages_current+0x8c/0x110 [47525.047736] [<ffffffff8116a209>] ? alloc_kmem_pages+0x19/0x90 [47525.047739] [<ffffffff8118a16e>] ? kmalloc_order_trace+0x2e/0x100 [47525.047746] [<ffffffffa144a0f5>] smb2_query_path_info+0x85/0x180 [cifs] [47525.047754] [<ffffffffa1432f98>] cifs_get_inode_info+0x368/0x660 [cifs] [47525.047757] [<ffffffff811f6504>] ? putname+0x54/0x60 [47525.047760] [<ffffffff811c48be>] ? __kmalloc+0x2e/0x250 [47525.047762] [<ffffffff811f68e6>] ? filename_lookup+0xc6/0x140 [47525.047770] [<ffffffffa1429976>] ? build_path_from_dentry+0xb6/0x210 [cifs] [47525.047777] [<ffffffffa14299e9>] ? build_path_from_dentry+0x129/0x210 [cifs] [47525.047785] [<ffffffffa14349aa>] cifs_revalidate_dentry_attr+0xda/0xf0 [cifs] [47525.047793] [<ffffffffa1434a71>] cifs_getattr+0x51/0x110 [cifs] [47525.047796] [<ffffffff811ec2d9>] vfs_getattr_nosec+0x29/0x40 [47525.047798] [<ffffffff811ec4f6>] vfs_getattr+0x26/0x30 [47525.047801] [<ffffffff811ec5d8>] vfs_fstatat+0x78/0xc0 [47525.047804] [<ffffffff811ecb26>] SyS_newlstat+0x36/0x70 [47525.047807] [<ffffffff811f120e>] ? path_put+0x1e/0x30 [47525.047809] [<ffffffff8120bd41>] ? path_getxattr+0x71/0xb0 [47525.047813] [<ffffffff8159832e>] entry_SYSCALL_64_fastpath+0x12/0x6d ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CAH2r5mtPHB=c-9Ds5JV2gXPLawS3HR+FMYDP1bzE=8otgkhEFw@mail.gmail.com>]
[parent not found: <CAH2r5mtPHB=c-9Ds5JV2gXPLawS3HR+FMYDP1bzE=8otgkhEFw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: CIFS hang [not found] ` <CAH2r5mtPHB=c-9Ds5JV2gXPLawS3HR+FMYDP1bzE=8otgkhEFw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-03-27 2:34 ` Dāvis Mosāns [not found] ` <CAOE4rSzU9+D94ZcOfyQNkgEKbtfj=JtxxSsrkDBOdrDma3=ybg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Dāvis Mosāns @ 2016-03-27 2:34 UTC (permalink / raw) To: Steve French, linux-cifs-u79uwXL29TY76Z2rM5mHXA 2016-03-27 5:10 GMT+03:00 Steve French <smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>: > How reproducible is it? Currently it happens randomly and I've no idea what or how it's caused so I can't reproduce only wait when it happens. But it have happened like once every few days. > Do you see a timeout occurring (ie if you look at /proc/fs/cifs/Stats > and /proc/fs/cifs/DebugData you may see session disconnect)? > After it did hung and then restarting smbd, right now everything is working and it shows $ cat /proc/fs/cifs/Stats Resources in use CIFS Session: 1 Share (unique mount targets): 1 SMB Request/Response Buffer: 1 Pool size: 5 SMB Small Req/Resp Buffer: 1 Pool size: 30 Operations (MIDs): 0 523 session 5 share reconnects Total vfs operations: 280 maximum at one time: 3 1) \\192.168.1.2\Data$ SMBs: 30 Negotiates: 0 sent 0 failed SessionSetups: 0 sent 0 failed Logoffs: 0 sent 0 failed TreeConnects: 0 sent 0 failed TreeDisconnects: 0 sent 0 failed Creates: 0 sent 2 failed Closes: 0 sent 0 failed Flushes: 0 sent 0 failed Reads: 0 sent 0 failed Writes: 0 sent 0 failed Locks: 0 sent 0 failed IOCTLs: 0 sent 0 failed Cancels: 0 sent 0 failed Echos: 0 sent 0 failed QueryDirectories: 0 sent 2 failed ChangeNotifies: 0 sent 0 failed QueryInfos: 0 sent 0 failed SetInfos: 0 sent 0 failed OplockBreaks: 0 sent 0 failed $ cat /proc/fs/cifs/DebugData Display Internal CIFS Data Structures for Debugging --------------------------------------------------- CIFS Version 2.08 Features: dfs fscache lanman posix spnego xattr acl Active VFS Requests: 0 Servers: 1) entry for 192.168.1.2 not fully displayed TCP status: 1 Local Users To Server: 1 SecMode: 0x1 Req On Wire: 0 Shares: 1) \\192.168.1.2\Data$ Mounts: 1 DevInfo: 0x60020 Attributes: 0xc700ff PathComponentMax: 255 Status: 1 type: DISK Share Capabilities: None Aligned, Partition Aligned, Share Flags: 0x0 Optimal sector size: 0x1000 MIDs: > If the scenario can be narrowed to something small - can you capture a > network trace and send it to me > > https://wiki.samba.org/index.php/Capture_Packets > > or another page describing how to do this: > > https://wiki.samba.org/index.php/LinuxCIFS_troubleshooting > > (the 2nd link also includes information on how to make a more detailed > trace of dmesg, the kernel messages which also would be useful) I'll try to when it happens, but because I don't know how to reproduce it will have to wait when it happens again. Also after hang there might not be any cifs/samba network traffic at all... So I guess first should find out how to reproduce it but I've no idea even how... It might be related so some specific file access pattern, timing or some other event. Anyway thanks! And hope we'll be able to figure out this one. ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CAOE4rSzU9+D94ZcOfyQNkgEKbtfj=JtxxSsrkDBOdrDma3=ybg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: CIFS hang [not found] ` <CAOE4rSzU9+D94ZcOfyQNkgEKbtfj=JtxxSsrkDBOdrDma3=ybg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-03-27 2:40 ` Steve French 0 siblings, 0 replies; 11+ messages in thread From: Steve French @ 2016-03-27 2:40 UTC (permalink / raw) To: Dāvis Mosāns; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Well the good news is that you can increase the trace level for the kernel cifs tracing and just look at the end of the kernel message log around the time of failure and that may give additional information - but fairly clearly this is reconnection related (since you had 523 reconnection attempts, and only 280 total file operations) and it is "normal" to hang (depending on "hard" vs. "soft" mount options and the type of operation) if the server or network goes down - but obviously it is usually interruptible. We may be able to make additional code paths interruptible that currently aren't to allow you to ctl-c around server hangs. On Sat, Mar 26, 2016 at 9:34 PM, Dāvis Mosāns <davispuh@gmail.com> wrote: > 2016-03-27 5:10 GMT+03:00 Steve French <smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>: >> How reproducible is it? > > Currently it happens randomly and I've no idea what or how it's caused > so I can't reproduce only wait when it happens. But it have happened > like once every few days. > >> Do you see a timeout occurring (ie if you look at /proc/fs/cifs/Stats >> and /proc/fs/cifs/DebugData you may see session disconnect)? >> > > After it did hung and then restarting smbd, right now everything is working > and it shows > > $ cat /proc/fs/cifs/Stats > Resources in use > CIFS Session: 1 > Share (unique mount targets): 1 > SMB Request/Response Buffer: 1 Pool size: 5 > SMB Small Req/Resp Buffer: 1 Pool size: 30 > Operations (MIDs): 0 > > 523 session 5 share reconnects > Total vfs operations: 280 maximum at one time: 3 > > 1) \\192.168.1.2\Data$ > SMBs: 30 > Negotiates: 0 sent 0 failed > SessionSetups: 0 sent 0 failed > Logoffs: 0 sent 0 failed > TreeConnects: 0 sent 0 failed > TreeDisconnects: 0 sent 0 failed > Creates: 0 sent 2 failed > Closes: 0 sent 0 failed > Flushes: 0 sent 0 failed > Reads: 0 sent 0 failed > Writes: 0 sent 0 failed > Locks: 0 sent 0 failed > IOCTLs: 0 sent 0 failed > Cancels: 0 sent 0 failed > Echos: 0 sent 0 failed > QueryDirectories: 0 sent 2 failed > ChangeNotifies: 0 sent 0 failed > QueryInfos: 0 sent 0 failed > SetInfos: 0 sent 0 failed > OplockBreaks: 0 sent 0 failed > > > $ cat /proc/fs/cifs/DebugData > Display Internal CIFS Data Structures for Debugging > --------------------------------------------------- > CIFS Version 2.08 > Features: dfs fscache lanman posix spnego xattr acl > Active VFS Requests: 0 > Servers: > 1) entry for 192.168.1.2 not fully displayed > TCP status: 1 > Local Users To Server: 1 SecMode: 0x1 Req On Wire: 0 > Shares: > 1) \\192.168.1.2\Data$ Mounts: 1 DevInfo: 0x60020 Attributes: 0xc700ff > PathComponentMax: 255 Status: 1 type: DISK > Share Capabilities: None Aligned, Partition Aligned, Share > Flags: 0x0 Optimal sector size: 0x1000 > > MIDs: > > >> If the scenario can be narrowed to something small - can you capture a >> network trace and send it to me >> >> https://wiki.samba.org/index.php/Capture_Packets >> >> or another page describing how to do this: >> >> https://wiki.samba.org/index.php/LinuxCIFS_troubleshooting >> >> (the 2nd link also includes information on how to make a more detailed >> trace of dmesg, the kernel messages which also would be useful) > > I'll try to when it happens, but because I don't know how to reproduce it > will have to wait when it happens again. Also after hang there might not > be any cifs/samba network traffic at all... So I guess first should > find out how to > reproduce it but I've no idea even how... It might be related so some specific > file access pattern, timing or some other event. > > > Anyway thanks! And hope we'll be able to figure out this one. -- Thanks, Steve ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: CIFS hang [not found] ` <CAOE4rSwtpUqzLUkUytNBJgTfe5kSKqZeKFgrhAP_TD4ms3wyVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-03-27 1:30 ` Steve French @ 2016-03-27 5:11 ` Shirish Pargaonkar [not found] ` <CADT32eLkS7Fr_TtpoPMnUDQpY13aZKpDEDthz3bEtVZZkDXUWA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 11+ messages in thread From: Shirish Pargaonkar @ 2016-03-27 5:11 UTC (permalink / raw) To: Dāvis Mosāns; +Cc: linux-cifs I think there is a code path in cifs client where two mutexes can be held. So the hung process is waiting for the first mutex. When the smbd is killed, both the process that holds both the mutexes exits and the hung process grabs that first mutex and eventually exits and thus hang clears. Would be nice to if we can obtain stack trace of all the current processes when a cifs process hangs. On Sat, Mar 26, 2016 at 8:21 PM, Dāvis Mosāns <davispuh@gmail.com> wrote: > 2016-02-21 21:07:07 GMT+02:00 Markus Greger: >> Hi, >> >> I've mounted two nas boxes via cifs on a 3.18.25 kernel client. After >> some time these get unavailable from the client and this impacts the >> system greatly (for example dialogs to save files hang as well). The >> hang is not limited to 60s or 300s but seems to hang infinitely (already >>> 100 minutes). > > I'm also getting CIFS hung, but on Arch Linux with 4.5.0 kernel and > cifs-utils 6.4 > > I'm mounting \\192.168.1.2\Data$ (which is a share on Windows 10) on > /mnt/Data with options > > credentials=/etc/samba/credentials,iocharset=utf8,vers=3.0,uid=user,gid=group,file_mode=0770,dir_mode=0770,noauto > > $ cat /proc/fs/cifs/DebugData > Display Internal CIFS Data Structures for Debugging > --------------------------------------------------- > CIFS Version 2.08 > Features: dfs fscache lanman posix spnego xattr acl > Active VFS Requests: 0 > Servers: > 1) entry for 192.168.1.2 not fully displayed > TCP status: 1 > Local Users To Server: 1 SecMode: 0x1 Req On Wire: 0 > Shares: > 1) \\192.168.1.2\Data$ Mounts: 1 DevInfo: 0x60020 Attributes: 0xc700ff > PathComponentMax: 255 Status: 1 type: DISK > Share Capabilities: None Aligned, Partition Aligned, Share > Flags: 0x0 Optimal sector size: 0x1000 > MIDs: > > when executing (or any application which tries to access /mnt) > > $ ls /mnt > > hungs and can't be stopped even with ^C > > [47525.047817] INFO: task ls:11878 blocked for more than 120 seconds. > [47525.047819] Tainted: P O 4.5.0-ARCH-dirty #1 > [47525.047820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [47525.047822] ls D ffff88029477f968 0 11878 1 0x00000004 > [47525.047825] ffff88029477f968 0000000000000000 ffff880386e8aac0 > ffff88029f8cc740 > [47525.047828] ffff880294780000 ffff88020d7e9424 ffff88029f8cc740 > 00000000ffffffff > [47525.047831] ffff88020d7e9428 ffff88029477f980 ffffffff815947ac > ffff88020d7e9420 > [47525.047834] Call Trace: > [47525.047837] [<ffffffff815947ac>] schedule+0x3c/0x90 > [47525.047840] [<ffffffff81594b85>] schedule_preempt_disabled+0x15/0x20 > [47525.047842] [<ffffffff8159603e>] __mutex_lock_slowpath+0xce/0x140 > [47525.047845] [<ffffffff815960c7>] mutex_lock+0x17/0x30 > [47525.047852] [<ffffffffa144684c>] small_smb2_init+0x18c/0x3f0 [cifs] > [47525.047855] [<ffffffff811c50ee>] ? kmem_cache_alloc_trace+0x1de/0x200 > [47525.047862] [<ffffffffa14479b9>] SMB2_open+0x79/0x8f0 [cifs] > [47525.047870] [<ffffffffa1439e56>] ? cifsConvertToUTF16+0x156/0x2f0 [cifs] > [47525.047878] [<ffffffffa143a0b1>] ? cifs_strndup_to_utf16+0xc1/0x110 [cifs] > [47525.047884] [<ffffffffa1449ebd>] smb2_open_op_close+0xad/0x1e0 [cifs] > [47525.047887] [<ffffffff811b9a0c>] ? alloc_pages_current+0x8c/0x110 > [47525.047890] [<ffffffff8116a209>] ? alloc_kmem_pages+0x19/0x90 > [47525.047893] [<ffffffff8118a16e>] ? kmalloc_order_trace+0x2e/0x100 > [47525.047899] [<ffffffffa144a0f5>] smb2_query_path_info+0x85/0x180 [cifs] > [47525.047907] [<ffffffffa1432f98>] cifs_get_inode_info+0x368/0x660 [cifs] > [47525.047910] [<ffffffff811f6504>] ? putname+0x54/0x60 > [47525.047913] [<ffffffff811c48be>] ? __kmalloc+0x2e/0x250 > [47525.047915] [<ffffffff811f68e6>] ? filename_lookup+0xc6/0x140 > [47525.047923] [<ffffffffa1429976>] ? build_path_from_dentry+0xb6/0x210 [cifs] > [47525.047930] [<ffffffffa14299e9>] ? build_path_from_dentry+0x129/0x210 [cifs] > [47525.047938] [<ffffffffa14349aa>] > cifs_revalidate_dentry_attr+0xda/0xf0 [cifs] > [47525.047945] [<ffffffffa1434a71>] cifs_getattr+0x51/0x110 [cifs] > [47525.047948] [<ffffffff811ec2d9>] vfs_getattr_nosec+0x29/0x40 > [47525.047951] [<ffffffff811ec4f6>] vfs_getattr+0x26/0x30 > [47525.047953] [<ffffffff811ec5d8>] vfs_fstatat+0x78/0xc0 > [47525.047956] [<ffffffff811ecb26>] SyS_newlstat+0x36/0x70 > [47525.047959] [<ffffffff8159832e>] entry_SYSCALL_64_fastpath+0x12/0x6d > > Note that this happens only sometimes and seems it's related to Samba > because as soon as I stop smbd then it unhungs and I get message about host > down but then when I try "ls" again it works and successfully shows share's > files/folders. > > Also I don't know who hangs first, but in same time Windows 10 is also basically > unusable because a lot of things/processes hangs/crashes/stop working like > explorer.exe, task manger, PowerShell and all other which try to access this > Arch Linux shares. But that happens only when shares are accessed by computer > name and when IP address is used directly then it works fine. Also as soon as I > stop smbd then Windows starts to respond again and everything works. And then > after starting smbd again all shares start to working. But one day that didn't > solved it and even after smbd restart Windows explorer still hung when tried to > access Arch Linux shares by computer name and only rebooting Linux fixed it. > > It really happens randomly and not often so I've no idea what causes it > and I don't know what/who hangs who. > But I see some possible bugs here, CIFS shouldn't hang even if > remote host isn't responding/have hung and same for Windows it shouldn't > wait forever on shares as it freezes basically all applications which are > accessing them. Also there might be some Samba bug too... > -- > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CADT32eLkS7Fr_TtpoPMnUDQpY13aZKpDEDthz3bEtVZZkDXUWA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: CIFS hang [not found] ` <CADT32eLkS7Fr_TtpoPMnUDQpY13aZKpDEDthz3bEtVZZkDXUWA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-03-27 14:47 ` Dāvis Mosāns [not found] ` <CAOE4rSz-HUTYx2kS2t=gjOSy7PEMd=X-ZQ5EMT7TWaXRpBrezQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Dāvis Mosāns @ 2016-03-27 14:47 UTC (permalink / raw) To: Shirish Pargaonkar; +Cc: linux-cifs 2016-03-27 8:11 GMT+03:00 Shirish Pargaonkar <shirishpargaonkar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>: > [..] > Would be nice to if we can obtain stack trace of all the current processes when > a cifs process hangs. > What would be the most easiest way to accomplish that? ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CAOE4rSz-HUTYx2kS2t=gjOSy7PEMd=X-ZQ5EMT7TWaXRpBrezQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: CIFS hang [not found] ` <CAOE4rSz-HUTYx2kS2t=gjOSy7PEMd=X-ZQ5EMT7TWaXRpBrezQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-03-28 11:09 ` Shirish Pargaonkar 0 siblings, 0 replies; 11+ messages in thread From: Shirish Pargaonkar @ 2016-03-28 11:09 UTC (permalink / raw) To: Dāvis Mosāns; +Cc: linux-cifs I have used crash command with subcommands such as ps and bt to look at processes and stack trace on the live system. On Sun, Mar 27, 2016 at 9:47 AM, Dāvis Mosāns <davispuh@gmail.com> wrote: > 2016-03-27 8:11 GMT+03:00 Shirish Pargaonkar <shirishpargaonkar@gmail.com>: >> [..] >> Would be nice to if we can obtain stack trace of all the current processes when >> a cifs process hangs. >> > > What would be the most easiest way to accomplish that? ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-07-19 9:57 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <56C9EF36.7010301@gmx.net>
[not found] ` <56C9EF36.7010301-hi6Y0CQ0nG0@public.gmane.org>
2016-02-21 19:07 ` CIFS hang Markus Greger
[not found] ` <56CA0ADB.5070303-hi6Y0CQ0nG0@public.gmane.org>
2016-02-22 4:36 ` Shirish Pargaonkar
[not found] ` <CAH2r5mu1Gaap826QGg+b-La6Nzz9We_Gb4rKJ3XqNQ4yHrEXUA@mail.gmail.com>
[not found] ` <CAH2r5mu1Gaap826QGg+b-La6Nzz9We_Gb4rKJ3XqNQ4yHrEXUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-19 9:57 ` Markus Greger
2016-03-27 1:21 Dāvis Mosāns
[not found] ` <CAOE4rSwtpUqzLUkUytNBJgTfe5kSKqZeKFgrhAP_TD4ms3wyVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-27 1:30 ` Steve French
[not found] ` <CAH2r5mshFG2FGaRhXnQ4nVKGd_V+1-eQj1K1JPMBBptdvOJL1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-27 2:03 ` Dāvis Mosāns
[not found] ` <CAH2r5mtPHB=c-9Ds5JV2gXPLawS3HR+FMYDP1bzE=8otgkhEFw@mail.gmail.com>
[not found] ` <CAH2r5mtPHB=c-9Ds5JV2gXPLawS3HR+FMYDP1bzE=8otgkhEFw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-27 2:34 ` Dāvis Mosāns
[not found] ` <CAOE4rSzU9+D94ZcOfyQNkgEKbtfj=JtxxSsrkDBOdrDma3=ybg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-27 2:40 ` Steve French
2016-03-27 5:11 ` Shirish Pargaonkar
[not found] ` <CADT32eLkS7Fr_TtpoPMnUDQpY13aZKpDEDthz3bEtVZZkDXUWA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-27 14:47 ` Dāvis Mosāns
[not found] ` <CAOE4rSz-HUTYx2kS2t=gjOSy7PEMd=X-ZQ5EMT7TWaXRpBrezQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-28 11:09 ` Shirish Pargaonkar
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.