linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* CIFS Unmount Issue
@ 2012-02-03 23:47 Mark Moseley
       [not found] ` <CAOH1cH=WsgDhwz1Dp5UsU3KV5Ocb3My9W3yjfEXiftHhHQ36ig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Moseley @ 2012-02-03 23:47 UTC (permalink / raw)
  To: linux-fsdevel

I've got a slew of Netapp Filers talking CIFS to some Debian Squeeze
64-bit boxes. I've noticed that in the kernel switch from 3.1 to 3.2,
the clients are no longer able to unmount a CIFS volume from an older
Filer. The Netapp versions in question are 7.2.7 and 7.0.6. I can
unmount on a 3.2.x kernel from a 7.2.7 Filer just fine. With a 7.0.6
Filer, I get the following error printed to /proc/kmsg:

<3>[  277.363460] CIFS VFS: RFC1001 size 35 smaller than SMB for mid=12
<7>[  277.363466] Bad SMB: : dump of 39 bytes of data at 0xffff880213e7e000
<7>[  277.363472]  23000000 424d53ff 00000074 00018800 . . . # � S M B
t . . . . . . .
<7>[  277.363478]  00000000 00000000 00000000 0e1000................<>
27338] 0c0000f0

but the umount call never returns, which makes reboots fun. I've
replicated this on 3.2.1 and 3.2.2. I've seen it print the same "Bad
SMB..." message as pasted above with 3.1.10 but the umount call
returns successfully. And unmounting from the 7.2.7 Filers does not
cause a "Bad SMB" message to get logged to /proc/kmsg.

The client is still responsive, and I can run whatever would helpful
to debug this. If I'm doing the unmount on the CLI, it hangs on the
'umount' syscall. If I kill the umount command, the mount is gone. As
far as I can see, the unmount is succeeding, but for whatever reason,
the umount system call isn't ever returning. Looking at a network
dump, the last client call is for a logoff, which seems to succeed.

There are no oops's or tracebacks logged.

I can post my whole .config if it's helpful, though for brevity sake,
here's the CIFS section:

CONFIG_CIFS=m
CONFIG_CIFS_STATS=y
# CONFIG_CIFS_STATS2 is not set
CONFIG_CIFS_WEAK_PW_HASH=y
CONFIG_CIFS_UPCALL=y
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
# CONFIG_CIFS_DEBUG2 is not set
CONFIG_CIFS_DFS_UPCALL=y
# CONFIG_CIFS_FSCACHE is not set
# CONFIG_CIFS_ACL is not set

Let me know what I can post to be of help here, or if I should repost
to LKML, or if I should just dust off git bisect. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: CIFS Unmount Issue
       [not found] ` <CAOH1cH=WsgDhwz1Dp5UsU3KV5Ocb3My9W3yjfEXiftHhHQ36ig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-02-05 12:32   ` Jeff Layton
  2012-02-05 21:10     ` Jeff Layton
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff Layton @ 2012-02-05 12:32 UTC (permalink / raw)
  To: Mark Moseley
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Fri, 3 Feb 2012 15:47:08 -0800
Mark Moseley <moseleymark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> I've got a slew of Netapp Filers talking CIFS to some Debian Squeeze
> 64-bit boxes. I've noticed that in the kernel switch from 3.1 to 3.2,
> the clients are no longer able to unmount a CIFS volume from an older
> Filer. The Netapp versions in question are 7.2.7 and 7.0.6. I can
> unmount on a 3.2.x kernel from a 7.2.7 Filer just fine. With a 7.0.6
> Filer, I get the following error printed to /proc/kmsg:
> 
> <3>[  277.363460] CIFS VFS: RFC1001 size 35 smaller than SMB for mid=12
> <7>[  277.363466] Bad SMB: : dump of 39 bytes of data at 0xffff880213e7e000
> <7>[  277.363472]  23000000 424d53ff 00000074 00018800 . . . # � S M B
> t . . . . . . .
> <7>[  277.363478]  00000000 00000000 00000000 0e1000................<>
> 27338] 0c0000f0
> 
> but the umount call never returns, which makes reboots fun. I've
> replicated this on 3.2.1 and 3.2.2. I've seen it print the same "Bad
> SMB..." message as pasted above with 3.1.10 but the umount call
> returns successfully. And unmounting from the 7.2.7 Filers does not
> cause a "Bad SMB" message to get logged to /proc/kmsg.
> 
> The client is still responsive, and I can run whatever would helpful
> to debug this. If I'm doing the unmount on the CLI, it hangs on the
> 'umount' syscall. If I kill the umount command, the mount is gone. As
> far as I can see, the unmount is succeeding, but for whatever reason,
> the umount system call isn't ever returning. Looking at a network
> dump, the last client call is for a logoff, which seems to succeed.
> 
> There are no oops's or tracebacks logged.
> 
> I can post my whole .config if it's helpful, though for brevity sake,
> here's the CIFS section:
> 
> CONFIG_CIFS=m
> CONFIG_CIFS_STATS=y
> # CONFIG_CIFS_STATS2 is not set
> CONFIG_CIFS_WEAK_PW_HASH=y
> CONFIG_CIFS_UPCALL=y
> CONFIG_CIFS_XATTR=y
> CONFIG_CIFS_POSIX=y
> # CONFIG_CIFS_DEBUG2 is not set
> CONFIG_CIFS_DFS_UPCALL=y
> # CONFIG_CIFS_FSCACHE is not set
> # CONFIG_CIFS_ACL is not set
> 
> Let me know what I can post to be of help here, or if I should repost
> to LKML, or if I should just dust off git bisect. Thanks!

(cc'ing linux-cifs list too)

NetApp filers have a long-standing (for years even) bug with their
handling of SMB_COM_LOGOFF_ANDX. The filer sends a malformed reply on
that command. cifs.ko tends to be a little more strict on checking the
various lengths in the packet than windows is so it tosses out the
reply.

I'd suggest filing a bug with netapp on this. You can reference this
(ancient) RH bug if you need more ammo:

    https://bugzilla.redhat.com/show_bug.cgi?id=191112

Now, that said...I think we have a bug in cifs.ko here too. It's
throwing out these replies without waking up the thread that's waiting
on it, even though we were probably able to match it to a request. This
patch will probably fix it, but it's untested and I need to stare at it
a bit more to ensure that it doesn't cause any problems.

In the meantime if you have a machine where you could test this, that
would be helpful. I'll plan to send it to Steve F. for inclusion in 3.3
and stable once I've smoke tested it a bit more.

Thanks,
-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

--------------------------[snip]------------------------

cifs: don't return error from standard_receive3 after marking response malformed

standard_receive3 will check the validity of the response from the
server (via checkSMB). It'll pass the result of that check to handle_mid
which will eventually dequeue it and mark it with a status of
MID_RESPONSE_MALFORMED. At that point, it'll also return an error, which
will make the demultiplex thread skip doing the callback for the mid.

This is wrong -- if we were able to identify the request and the
response is malformed, then we want the demultiplex thread to do the
callback.  Fix this by making standard_receive3 return 0 in this
situation.

Reported-by: Mark Moseley <moseleymark-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/connect.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index aa687c8..83104a5 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -776,7 +776,7 @@ standard_receive3(struct TCP_Server_Info *server, struct mid_q_entry *mid)
 	if (mid)
 		handle_mid(mid, server, smb_buffer, length);
 
-	return length;
+	return 0;
 }
 
 static int
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: CIFS Unmount Issue
  2012-02-05 12:32   ` Jeff Layton
@ 2012-02-05 21:10     ` Jeff Layton
  2012-02-07  1:02       ` Mark Moseley
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff Layton @ 2012-02-05 21:10 UTC (permalink / raw)
  To: Mark Moseley; +Cc: linux-fsdevel, linux-cifs

On Sun, 5 Feb 2012 07:32:31 -0500
Jeff Layton <jlayton@redhat.com> wrote:

> On Fri, 3 Feb 2012 15:47:08 -0800
> Mark Moseley <moseleymark@gmail.com> wrote:
> 
> > I've got a slew of Netapp Filers talking CIFS to some Debian Squeeze
> > 64-bit boxes. I've noticed that in the kernel switch from 3.1 to 3.2,
> > the clients are no longer able to unmount a CIFS volume from an older
> > Filer. The Netapp versions in question are 7.2.7 and 7.0.6. I can
> > unmount on a 3.2.x kernel from a 7.2.7 Filer just fine. With a 7.0.6
> > Filer, I get the following error printed to /proc/kmsg:
> > 
> > <3>[  277.363460] CIFS VFS: RFC1001 size 35 smaller than SMB for mid=12
> > <7>[  277.363466] Bad SMB: : dump of 39 bytes of data at 0xffff880213e7e000
> > <7>[  277.363472]  23000000 424d53ff 00000074 00018800 . . . # � S M B
> > t . . . . . . .
> > <7>[  277.363478]  00000000 00000000 00000000 0e1000................<>
> > 27338] 0c0000f0
> > 
> > but the umount call never returns, which makes reboots fun. I've
> > replicated this on 3.2.1 and 3.2.2. I've seen it print the same "Bad
> > SMB..." message as pasted above with 3.1.10 but the umount call
> > returns successfully. And unmounting from the 7.2.7 Filers does not
> > cause a "Bad SMB" message to get logged to /proc/kmsg.
> > 
> > The client is still responsive, and I can run whatever would helpful
> > to debug this. If I'm doing the unmount on the CLI, it hangs on the
> > 'umount' syscall. If I kill the umount command, the mount is gone. As
> > far as I can see, the unmount is succeeding, but for whatever reason,
> > the umount system call isn't ever returning. Looking at a network
> > dump, the last client call is for a logoff, which seems to succeed.
> > 
> > There are no oops's or tracebacks logged.
> > 
> > I can post my whole .config if it's helpful, though for brevity sake,
> > here's the CIFS section:
> > 
> > CONFIG_CIFS=m
> > CONFIG_CIFS_STATS=y
> > # CONFIG_CIFS_STATS2 is not set
> > CONFIG_CIFS_WEAK_PW_HASH=y
> > CONFIG_CIFS_UPCALL=y
> > CONFIG_CIFS_XATTR=y
> > CONFIG_CIFS_POSIX=y
> > # CONFIG_CIFS_DEBUG2 is not set
> > CONFIG_CIFS_DFS_UPCALL=y
> > # CONFIG_CIFS_FSCACHE is not set
> > # CONFIG_CIFS_ACL is not set
> > 
> > Let me know what I can post to be of help here, or if I should repost
> > to LKML, or if I should just dust off git bisect. Thanks!
> 
> (cc'ing linux-cifs list too)
> 
> NetApp filers have a long-standing (for years even) bug with their
> handling of SMB_COM_LOGOFF_ANDX. The filer sends a malformed reply on
> that command. cifs.ko tends to be a little more strict on checking the
> various lengths in the packet than windows is so it tosses out the
> reply.
> 
> I'd suggest filing a bug with netapp on this. You can reference this
> (ancient) RH bug if you need more ammo:
> 
>     https://bugzilla.redhat.com/show_bug.cgi?id=191112
> 
> Now, that said...I think we have a bug in cifs.ko here too. It's
> throwing out these replies without waking up the thread that's waiting
> on it, even though we were probably able to match it to a request. This
> patch will probably fix it, but it's untested and I need to stare at it
> a bit more to ensure that it doesn't cause any problems.
> 
> In the meantime if you have a machine where you could test this, that
> would be helpful. I'll plan to send it to Steve F. for inclusion in 3.3
> and stable once I've smoke tested it a bit more.
> 
> Thanks,

Revised patch. We want to return "length" if the mid wasn't ID'ed:

-------------------------------[snip]------------------------------

cifs: don't return error from standard_receive3 after marking response malformed

standard_receive3 will check the validity of the response from the
server (via checkSMB). It'll pass the result of that check to handle_mid
which will eventually dequeue it and mark it with a status of
MID_RESPONSE_MALFORMED. At that point, it'll also return an error, which
will make the demultiplex thread skip doing the callback for the mid.

This is wrong -- if we were able to identify the request and the
response is now malformed, then we want the demultiplex thread to do the
callback.  Fix this by making standard_receive3 return 0 in this
situation.

Cc: stable@vger.kernel.org
Reported-by: Mark Moseley <moseleymark@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/cifs/connect.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index aa687c8..4759543 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -773,10 +773,11 @@ standard_receive3(struct TCP_Server_Info *server, struct mid_q_entry *mid)
 		cifs_dump_mem("Bad SMB: ", buf,
 			min_t(unsigned int, server->total_read, 48));
 
-	if (mid)
-		handle_mid(mid, server, smb_buffer, length);
+	if (!mid)
+		return length;
 
-	return length;
+	handle_mid(mid, server, smb_buffer, length);
+	return 0;
 }
 
 static int
-- 
1.7.7.6


-- 
Jeff Layton <jlayton@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: CIFS Unmount Issue
  2012-02-05 21:10     ` Jeff Layton
@ 2012-02-07  1:02       ` Mark Moseley
  0 siblings, 0 replies; 4+ messages in thread
From: Mark Moseley @ 2012-02-07  1:02 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-fsdevel, linux-cifs

On Sun, Feb 5, 2012 at 1:10 PM, Jeff Layton <jlayton@redhat.com> wrote:
> On Sun, 5 Feb 2012 07:32:31 -0500
> Jeff Layton <jlayton@redhat.com> wrote:
>
>> On Fri, 3 Feb 2012 15:47:08 -0800
>> Mark Moseley <moseleymark@gmail.com> wrote:
>>
>> > I've got a slew of Netapp Filers talking CIFS to some Debian Squeeze
>> > 64-bit boxes. I've noticed that in the kernel switch from 3.1 to 3.2,
>> > the clients are no longer able to unmount a CIFS volume from an older
>> > Filer. The Netapp versions in question are 7.2.7 and 7.0.6. I can
>> > unmount on a 3.2.x kernel from a 7.2.7 Filer just fine. With a 7.0.6
>> > Filer, I get the following error printed to /proc/kmsg:
>> >
>> > <3>[  277.363460] CIFS VFS: RFC1001 size 35 smaller than SMB for mid=12
>> > <7>[  277.363466] Bad SMB: : dump of 39 bytes of data at 0xffff880213e7e000
>> > <7>[  277.363472]  23000000 424d53ff 00000074 00018800 . . . # � S M B
>> > t . . . . . . .
>> > <7>[  277.363478]  00000000 00000000 00000000 0e1000................<>
>> > 27338] 0c0000f0
>> >
>> > but the umount call never returns, which makes reboots fun. I've
>> > replicated this on 3.2.1 and 3.2.2. I've seen it print the same "Bad
>> > SMB..." message as pasted above with 3.1.10 but the umount call
>> > returns successfully. And unmounting from the 7.2.7 Filers does not
>> > cause a "Bad SMB" message to get logged to /proc/kmsg.
>> >
>> > The client is still responsive, and I can run whatever would helpful
>> > to debug this. If I'm doing the unmount on the CLI, it hangs on the
>> > 'umount' syscall. If I kill the umount command, the mount is gone. As
>> > far as I can see, the unmount is succeeding, but for whatever reason,
>> > the umount system call isn't ever returning. Looking at a network
>> > dump, the last client call is for a logoff, which seems to succeed.
>> >
>> > There are no oops's or tracebacks logged.
>> >
>> > I can post my whole .config if it's helpful, though for brevity sake,
>> > here's the CIFS section:
>> >
>> > CONFIG_CIFS=m
>> > CONFIG_CIFS_STATS=y
>> > # CONFIG_CIFS_STATS2 is not set
>> > CONFIG_CIFS_WEAK_PW_HASH=y
>> > CONFIG_CIFS_UPCALL=y
>> > CONFIG_CIFS_XATTR=y
>> > CONFIG_CIFS_POSIX=y
>> > # CONFIG_CIFS_DEBUG2 is not set
>> > CONFIG_CIFS_DFS_UPCALL=y
>> > # CONFIG_CIFS_FSCACHE is not set
>> > # CONFIG_CIFS_ACL is not set
>> >
>> > Let me know what I can post to be of help here, or if I should repost
>> > to LKML, or if I should just dust off git bisect. Thanks!
>>
>> (cc'ing linux-cifs list too)
>>
>> NetApp filers have a long-standing (for years even) bug with their
>> handling of SMB_COM_LOGOFF_ANDX. The filer sends a malformed reply on
>> that command. cifs.ko tends to be a little more strict on checking the
>> various lengths in the packet than windows is so it tosses out the
>> reply.
>>
>> I'd suggest filing a bug with netapp on this. You can reference this
>> (ancient) RH bug if you need more ammo:
>>
>>     https://bugzilla.redhat.com/show_bug.cgi?id=191112
>>
>> Now, that said...I think we have a bug in cifs.ko here too. It's
>> throwing out these replies without waking up the thread that's waiting
>> on it, even though we were probably able to match it to a request. This
>> patch will probably fix it, but it's untested and I need to stare at it
>> a bit more to ensure that it doesn't cause any problems.
>>
>> In the meantime if you have a machine where you could test this, that
>> would be helpful. I'll plan to send it to Steve F. for inclusion in 3.3
>> and stable once I've smoke tested it a bit more.
>>
>> Thanks,
>
> Revised patch. We want to return "length" if the mid wasn't ID'ed:
>
> -------------------------------[snip]------------------------------
>
> cifs: don't return error from standard_receive3 after marking response malformed
>
> standard_receive3 will check the validity of the response from the
> server (via checkSMB). It'll pass the result of that check to handle_mid
> which will eventually dequeue it and mark it with a status of
> MID_RESPONSE_MALFORMED. At that point, it'll also return an error, which
> will make the demultiplex thread skip doing the callback for the mid.
>
> This is wrong -- if we were able to identify the request and the
> response is now malformed, then we want the demultiplex thread to do the
> callback.  Fix this by making standard_receive3 return 0 in this
> situation.
>
> Cc: stable@vger.kernel.org
> Reported-by: Mark Moseley <moseleymark@gmail.com>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/cifs/connect.c |    7 ++++---
>  1 files changed, 4 insertions(+), 3 deletions(-)

Awesome, thanks for the info and thanks for the patch. I can confirm
it works just fine. The umount syscall comes back immediately and
everything looks good.

I'll see what Netapp says, though hopefully we'll be off of 7.0.6 soon anyway.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-02-07  1:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-03 23:47 CIFS Unmount Issue Mark Moseley
     [not found] ` <CAOH1cH=WsgDhwz1Dp5UsU3KV5Ocb3My9W3yjfEXiftHhHQ36ig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-02-05 12:32   ` Jeff Layton
2012-02-05 21:10     ` Jeff Layton
2012-02-07  1:02       ` Mark Moseley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).