linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Soumya Koduri <skoduri@redhat.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Omar Walid Llorente" <omar@dit.upm.es>,
	"Jeff Layton" <jlayton@poochiereds.net>,
	linux-nfs@vger.kernel.org,
	"administración del centro de cálculo del dit" <cdc@dit.upm.es>
Subject: Re: possible bug in nfs-kernel-server
Date: Mon, 21 Dec 2015 14:18:20 +0530	[thread overview]
Message-ID: <5677BCD4.4060009@redhat.com> (raw)
In-Reply-To: <20151218200840.GA28692@fieldses.org>

[-- Attachment #1: Type: text/plain, Size: 7973 bytes --]



On 12/19/2015 01:38 AM, J. Bruce Fields wrote:
> On Fri, Dec 18, 2015 at 10:47:42PM +0530, Soumya Koduri wrote:
>>
>>
>> On 12/18/2015 08:50 PM, J. Bruce Fields wrote:
>>> On Fri, Dec 18, 2015 at 02:13:40PM +0530, Soumya Koduri wrote:
>>>>
>>>>
>>>> On 12/18/2015 06:07 AM, Malahal Naineni wrote:
>>>>> IIRC, permission checks are done in open(). write/read syscalls should
>>>>> NOT do much access checks (at least based on POSIX). This is why once an
>>>>> open is done, you remove permissions for that process, but it should
>>>>> still be able to read/write based on the open flags it did when it
>>>>> opened the file.
>>>>>
>>>>> I don't know all the details of this defect, but gluster seems to be
>>>>> doing what it is supposed to do.
>>>>>
>>>> Right. Thanks for the correction. I assumed the behavior should be
>>>> same for both OPEN+WRITE vs CREATE+WRITE in the below scenario. But
>>>> looks like (from 'man creat')  the open() call that creates a
>>>> read-only file may well return a read/write file descriptor, which
>>>> is the reason the following WRITE can succeed.
>>>
>>> I forgot another complication, which is that knsfd actually does a
>>> temporary open before each read or write--I assume that's getting
>>> translated into fuse and gluster open operations?
>>>
>> yes. It is the OPEN done as part of NFS WRITE which fails with
>> EACCESS error (with both NFSv3 and NFSv4 mounts).
>
> Makes sense for v3, but I wouldn't normally expect the extra temporary
> open on v4 WRITEs.  Could you share any details?
>
I re-tried the test on v4 mount using Fedora23 machine, acting as both 
NFS server and client (Linux#4.2.3-300.fc23.x86_64). Please find the pkt 
trace attached.

  56 07:23:25.567134          ::1 -> ::1          NFS 288 V4 Call WRITE 
StateID: 0xf934 Offset: 0 Len: 7
  57 07:23:25.567233 192.168.122.17 -> 192.168.122.202 GlusterFS 188 
V330 GETXATTR Call
  58 07:23:25.567732 192.168.122.202 -> 192.168.122.17 GlusterFS 112 
V330 GETXATTR Reply (Call In 57)
  59 07:23:25.567881 192.168.122.17 -> 192.168.122.202 GlusterFS 164 
V330 OPEN Call
  60 07:23:25.568354 192.168.122.202 -> 192.168.122.17 GlusterFS 116 
V330 OPEN Reply (Call In 59)
  61 07:23:25.568570          ::1 -> ::1          NFS 144 V4 Reply (Call 
In 56) WRITE Status: NFS4ERR_ACCESS

Thanks,
Soumya

> --b.
>
>>
>>   63 16:59:09.278651000          ::1 -> ::1          NFS 232 V3 WRITE
>> Call, FH: 0x49a35e54 Offset: 0 Len: 7 FILE_SYNC
>>   64 16:59:09.278926000 192.168.122.1 -> 192.168.122.202 GlusterFS
>> 164 V330 OPEN Call
>>   65 16:59:09.278937000 192.168.122.1 -> 192.168.122.202 GlusterFS
>> 164 [RPC retransmission of #64][TCP Retransmission] V330 OPEN Call
>>   66 16:59:09.279459000 192.168.122.202 -> 192.168.122.1 GlusterFS
>> 116 V330 OPEN Reply (Call In 64)
>>   67 16:59:09.279459000 192.168.122.202 -> 192.168.122.1 GlusterFS
>> 116 [RPC duplicate of #66][TCP Retransmission] V330 OPEN Reply (Call
>> In 64)
>>   68 16:59:09.279733000          ::1 -> ::1          NFS 212 V3 WRITE
>> Reply (Call In 63) Error: NFS3ERR_ACCES
>>
>>
>> Thanks,
>> Soumya
>>
>>> In which case it might be worth experimenting with NFSv4 or with Jeff
>>> Layton's filehandle-caching patches.  Neither's a real fix, but that
>>> could help confirm whether it's the temporary opens that are a problem.
>>>
>>> --b.
>>>
>>>>
>>>> Thanks,
>>>> Soumya
>>>>
>>>>
>>>>> Regards, Malahal.
>>>>>
>>>>> Soumya Koduri [skoduri@redhat.com] wrote:
>>>>>> As mentioned by Bruce, GlusterFS doesn't have owner-override rule
>>>>>> except for setattr.
>>>>>>
>>>>>> I did few experiments to check why this test case passes on plain
>>>>>> glusterfs fuse mount & NFS-Ganesha but fails with kernel-NFS.
>>>>>>
>>>>>> NFS-Ganesha (for most of the FSALs) seem to be passing the actual
>>>>>> request credentials to the back-end filesystem only for
>>>>>> CREATE(-like) and UNLINK fops. For all the remaining fops, it does
>>>>>> the access check at its end and then perform the operation with root
>>>>>> credentials. That's the reason WRITE succeeded in your case as
>>>>>> NFS-Ganesha (like kernel-NFS) skipped the access check if the
>>>>>> request caller_uid proved to be the file's owner.
>>>>>>
>>>>>> In case of native GlusterFS FUSE mount, there is no OPEN fop
>>>>>> involved. WRITE is performed on the fd returned by CREATE. And
>>>>>> strangely GlusterFS seem to be doing certain access checks only
>>>>>> during OPEN but not for WRITE (this seems like a bug and probably
>>>>>> needs to be fixed in Gluster).
>>>>>>
>>>>>> Thanks,
>>>>>> Soumya
>>>>>>
>>>>>> On 12/14/2015 10:27 PM, Omar Walid Llorente wrote:
>>>>>>>
>>>>>>> Thank you Bruce, others, for the responses. I send attached a complete
>>>>>>> capture of the issue, including the glusterfs transactions.
>>>>>>>
>>>>>>> Hope this helps to clear where may it be...
>>>>>>>
>>>>>>> Omar
>>>>>>>
>>>>>>> El 10/12/15 a las 15:44, J. Bruce Fields escribió:
>>>>>>>> On Thu, Dec 10, 2015 at 05:59:33PM +0530, Soumya Koduri wrote:
>>>>>>>>>
>>>>>>>>> On 12/10/2015 04:02 PM, Omar Walid Llorente wrote:
>>>>>>>>>> Hi, Jeff, Bruce, finally I got some time to get the capture of the nfs
>>>>>>>>>> packets (you can find them in attached file nfs-problem-nks.pcap.zip).
>>>>>>>>>> Sorry for being so late.
>>>>>>>>>>
>>>>>>>>>> What I did was the following:
>>>>>>>>>>
>>>>>>>>>> 1st) Create the RO file:
>>>>>>>>>> cdc@l056:~/prueba-git$ rm -f kk.txt 444.txt; echo "prueba" > 444.txt;
>>>>>>>>>> chmod 444 444.txt;
>>>>>>>>>>
>>>>>>>>>> 2nd) Init the capture:
>>>>>>>>>> root@l056:~# tcpdump -i eth2 -w /tmp/nfs.pcap -s 512 port 2049
>>>>>>>>>> tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size
>>>>>>>>>> 512 bytes
>>>>>>>>>>
>>>>>>>>> GlusterFS protocol is added to wireshark from version 1.8.0 [1]. It
>>>>>>>>> may be helpful to see what GlusterFS operations are being processed
>>>>>>>>> as part of NFS WRITE call (which has failed in this case).
>>>>>>>>>
>>>>>>>>> Could you please try taking the packet trace on the machine where
>>>>>>>>> NFS server is running (without filtering out based on the port
>>>>>>>>> number).
>>>>>>>>>
>>>>>>>>> Also I tried out the same test on Fedora22 machine, but haven't run
>>>>>>>>> into any issue. What are the fuse mount options you have used to
>>>>>>>>> mount gluster volume?
>>>>>>>> Oh, I think this is a simple problem (but maybe hard to fix).  The
>>>>>>>> capture shows NFSv3 traffic like:
>>>>>>>>
>>>>>>>>     CREATE -> OK
>>>>>>>>     SETATTR (mode set to 0400) -> OK
>>>>>>>>     WRITE -> NFS3ERR_ACCES
>>>>>>>>
>>>>>>>> That write would succeed locally (because the mode doesn't matter to a
>>>>>>>> local application that already holds the file open).  It would fail over
>>>>>>>> NFSv3, which doesn't know about the open--except that there's a hack for
>>>>>>>> this case: NFSv3 servers allow IO operations to ignore the mode, if the
>>>>>>>> operation comes from the owner of the file.  NFSv3 clients are then
>>>>>>>> careful to perform necessary access checks on open to ensure that this
>>>>>>>> owner-override rule doesn't grant too many permissions.
>>>>>>>>
>>>>>>>> That allows NFSv3 applications to see behavior that's mostly like a
>>>>>>>> local filesystem, without opening much of a security hole (since the
>>>>>>>> owner could always chmod anyway).
>>>>>>>>
>>>>>>>> So, knfsd is making this special exception--but gluster (which I believe
>>>>>>>> it's exporting in this case, via fuse?)--probably doesn't....  I'm not
>>>>>>>> sure what you can do about that.
>>>>>>>>
>>>>>>>> --b.
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

[-- Attachment #2: nfs_v4_mount_+_glusterfs.pcap --]
[-- Type: application/vnd.tcpdump.pcap, Size: 22071 bytes --]

  reply	other threads:[~2015-12-21  8:48 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-20 11:04 possible bug in nfs-kernel-server Omar Walid Llorente
2015-11-23 21:18 ` J. Bruce Fields
2015-11-25 16:23   ` omar
     [not found] ` <20151121091824.71ab1f6b@tlielax.poochiereds.net>
2015-11-25 13:50   ` omar
2015-12-10 10:32     ` Omar Walid Llorente
2015-12-10 12:29       ` Soumya Koduri
2015-12-10 14:44         ` J. Bruce Fields
2015-12-14 16:57           ` Omar Walid Llorente
2015-12-17 12:16             ` Soumya Koduri
2015-12-18  0:37               ` Malahal Naineni
2015-12-18  8:43                 ` Soumya Koduri
2015-12-18 15:20                   ` J. Bruce Fields
2015-12-18 17:17                     ` Soumya Koduri
2015-12-18 20:08                       ` J. Bruce Fields
2015-12-21  8:48                         ` Soumya Koduri [this message]
2015-12-21 16:47                           ` J. Bruce Fields
2015-12-21 17:58                             ` Soumya Koduri
2015-12-21 20:14                               ` J. Bruce Fields
     [not found]                                 ` <2443f0d3-6937-ae92-d4d5-6e1f00a19e81@dit.upm.es>
2016-11-08 20:16                                   ` J. Bruce Fields
2016-11-11 17:57                                     ` Omar Walid Llorente
2016-11-11 19:03                                       ` J. Bruce Fields
2016-11-11 22:04                                         ` J. Bruce Fields
2016-11-15 10:13                                           ` Miklos Szeredi
2016-11-16 18:19                                             ` Omar Walid Llorente
2016-11-18 14:16                                               ` Miklos Szeredi
2016-11-18 16:03                                                 ` Omar Walid Llorente
2016-11-21 12:56                                                   ` Soumya Koduri
2016-11-21 14:57                                                     ` J. Bruce Fields
2016-11-22 14:45                                                       ` Soumya Koduri
2016-11-28 18:03                                                         ` Omar Walid Llorente
2016-11-28 18:25                                                           ` J. Bruce Fields
2016-12-15 17:06                                                             ` Omar Walid Llorente
     [not found]                                                               ` <HK2PR0401MB15701B151822C20064F3D418FE9D0@HK2PR0401MB1570.apcprd04.prod.outlook.com>
2016-12-15 20:19                                                                 ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5677BCD4.4060009@redhat.com \
    --to=skoduri@redhat.com \
    --cc=bfields@fieldses.org \
    --cc=cdc@dit.upm.es \
    --cc=jlayton@poochiereds.net \
    --cc=linux-nfs@vger.kernel.org \
    --cc=omar@dit.upm.es \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).