From: Kinglong Mee <kinglongmee@gmail.com>
To: Frank Filz <ffilzlnx@mindspring.com>,
'Anna Schumaker' <schumaker.anna@gmail.com>,
'Trond Myklebust' <trondmy@hammerspace.com>,
devel@lists.nfs-ganesha.org, linux-nfs@vger.kernel.org
Subject: Re: [NFS-Ganesha-Devel] Re: lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK
Date: Thu, 13 Sep 2018 08:03:51 +0800 [thread overview]
Message-ID: <acda682b-fdc1-3e45-baef-03142bafb2f8@gmail.com> (raw)
In-Reply-To: <012901d44a8f$ed7c6bc0$c8754340$@mindspring.com>
On 2018/9/12 19:58, Frank Filz wrote:
>> On 2018/9/12 7:20, Frank Filz wrote:
>>>> On Tue, 2018-09-11 at 22:47 +0800, Kinglong Mee wrote:
>>>>> On 2018/9/11 20:57, Trond Myklebust wrote:
>>>>>> On Tue, 2018-09-11 at 20:29 +0800, Kinglong Mee wrote:
>>>>>>> The latest ganesha/gluster supports seek according to,
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#secti
>>>>>> o
>>>>>> n-15.11
>>>>>>>
>>>>>>> From the given sa_offset, find the next data_content4 of type
>>>>>>> sa_what
>>>>>>> in the file. If the server can not find a corresponding sa_what,
>>>>>>> then the status will still be NFS4_OK, but sr_eof would be
>>>>>>> TRUE. If
>>>>>>> the server can find the sa_what, then the sr_offset is the
>>>>>>> start of
>>>>>>> that content. If the sa_offset is beyond the end of the file,
>>>>>>> then
>>>>>>> SEEK MUST return NFS4ERR_NXIO.
>>>>>>>
>>>>>>> For a file's filemap as,
>>>>>>>
>>>>>>> Part 1: HOLE 0x0000000000000000 ---> 0x0000000000600000
>>>>>>> Part 2: DATA 0x0000000000600000 ---> 0x0000000000700000
>>>>>>> Part 3: HOLE 0x0000000000700000 ---> 0x0000000001000000>>
>>>>>>> SEEK(0x700000, SEEK_DATA) gets result (sr_eof:1,
>>>>>>> sr_offset:0x70000) from ganesha/gluster; SEEK(0x700000, SEEK_HOLE)
>>>>>>> gets result (sr_eof:0, sr_offset:0x70000) from ganesha/gluster.
>>>>>>>
>>>>>>> If an application depends the lseek result for data searching, it
>>>>>>> may enter infinite loop.
>>>>>>>
>>>>>>> while (1) {
>>>>>>> next_pos = lseek(fd, cur_pos, seek_type);
>>>>>>> if (seek_type == SEEK_DATA) {
>>>>>>> seek_type = SEEK_HOLE;
>>>>>>> } else {
>>>>>>> seek_type = SEEK_DATA;
>>>>>>> }
>>>>>>>
>>>>>>> if (next_pos == -1) {
>>>>>>> return ;
>>>>>>>
>>>>>>> cur_pos = next_pos;
>>>>>>> }
>>>>>>>
>>>>>>> The lseek syscall always gets 0x70000 from nfs client for those
>>>>>>> two cases, but, if underlying filesystem is ext4/f2fs, or the nfs
>>>>>>> server is knfsd, the lseek(0x700000, SEEK_DATA) gets ENXIO.
>>>>>>>
>>>>>>> I wanna to know,
>>>>>>> should I fix the ganesha/gluster as knfsd return ENXIO for the
>>>>>>> first case?
>>>>>>> or should I fix the nfs client to return ENXIO for the first case?
>>>>>>>
>>>>>>
>>>>>> It that test correct? The fallback implementation of SEEK_DATA
>>>>>> assumes that the entire file is data, so lseek(SEEK_DATA) on any
>>>>>> offset that is <= eof will be a no-op. The fallback implementation
>>>>>> of SEEK_HOLE assumes that the first hole is at eof.
>>>>>
>>>>> I think that's non-nfsv4.2's logical.
>>>>>
>>>>>>
>>>>>> IOW: unless the initial value for cur_pos is > eof, it looks to me
>>>>>> as if the above test will loop infinitely given any filesystem that
>>>>>> doesn't implement native support for SEEK_DATA/SEEK_HOLE.
>>>>>>
>>>>>
>>>>> No, if underlying filesystem is ext4/f2fs, or the nfs server is
>>>>> knfsd, the last lseek syscall always return ENXIO no matter the
>>>>> cur_pos is > eof or not.
>>>>>
>>>>> A file ends with a hole as,
>>>>> Part 22: DATA 0x0000000006a00000 ---> 0x0000000006afffff
>>>>> Part 23: HOLE 0x0000000006b00000 ---> 0x000000000c7fffff
>>>>>
>>>>> # stat testfile
>>>>> File: testfile
>>>>> Size: 209715200 Blocks: 22640 IO Block: 4096 regular file
>>>>> Device: 807h/2055d Inode: 843122 Links: 2
>>>>> Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
>>>>> Access: 2018-09-11 20:01:41.881227061 +0800
>>>>> Modify: 2018-09-11 20:01:41.976478311 +0800
>>>>> Change: 2018-09-11 20:01:41.976478311 +0800
>>>>> Birth: -
>>>>>
>>>>> # strace filemap testfile
>>>>> ... ...
>>>>> lseek(3, 111149056, SEEK_HOLE) = 112197632
>>>>> lseek(3, 112197632, SEEK_DATA) = -1 ENXIO (No such device or
>> address)
>>>>>
>>>>> Right now, when knfsd gets the ENXIO error, it returns the error to
>>>>> client directly, and return to syscall.
>>>>> But, ganesha set the sr_eof to true and return NFS4_OK to client as
>>>>> RFC description, nfs client skips the sr_eof and return a bad offset
>>>>> to syscall.
>>>>
>>>> Would it make more sense to change Knfsd instead of the client? I
>>>> think I was trying to keep things simple when I wrote the code, so I
>>>> just passed the result of the lseek system call back to the client.
>>>
>>> Looking at the lseek(2) man page, it's not clear to me what should be returned
>> if as in this example, there is a HOLE at the end of the file (i.e. filesize is larger
>> than the range of the last DATA in the file). It sounds like ext4 returns ENXIO if a
>> SEEK_DATA is done past the last data in the file.
>>>
>>> SEEK_DATA
>>> Adjust the file offset to the next location in the file greater than or
>> equal to offset containing data. If offset points to data, then the file offset is
>> set
>>> to offset.
>>>
>>> SEEK_HOLE
>>> Adjust the file offset to the next hole in the file greater than or equal
>> to offset. If offset points into the middle of a hole, then the file offset is set to
>>> offset. If there is no hole past offset, then the file offset is adjusted to
>> the end of the file (i.e., there is an implicit hole at the end of any file).
>>>
>>> In both of the above cases, lseek() fails if offset points past the end of the
>> file.
>>>
>>> These operations allow applications to map holes in a sparsely allocated
>> file. This can be useful for applications such as file backup tools, which can save
>> space when
>>> creating backups and preserve holes, if they have a mechanism for
>> discovering holes.
>>>
>>> For the purposes of these operations, a hole is a sequence of zeros that
>> (normally) has not been allocated in the underlying file storage. However, a
>> filesystem is not
>>> obliged to report holes, so these operations are not a guaranteed
>> mechanism for mapping the storage space actually allocated to a file.
>> (Furthermore, a sequence of
>>> zeros that actually has been written to the underlying storage may not be
>> reported as a hole.) In the simplest implementation, a filesystem can support
>> the operations
>>> by making SEEK_HOLE always return the offset of the end of the file, and
>> making SEEK_DATA always return offset (i.e., even if the location referred to by
>> offset is a
>>> hole, it can be considered to consist of data that is a sequence of zeros).
>>>
>>> The RFC text is pretty clear:
>>>
>>> SEEK is an operation that allows a client to determine the location
>>> of the next data_content4 in a file. It allows an implementation of
>>> the emerging extension to lseek(2) to allow clients to determine the
>>> next hole whilst in data or the next data whilst in a hole.
>>>
>>> From the given sa_offset, find the next data_content4 of type sa_what
>>> in the file. If the server can not find a corresponding sa_what,
>>> then the status will still be NFS4_OK, but sr_eof would be TRUE. If
>>> the server can find the sa_what, then the sr_offset is the start of
>>> that content. If the sa_offset is beyond the end of the file, then
>>> SEEK MUST return NFS4ERR_NXIO.
>>>
>>> All files MUST have a virtual hole at the end of the file. I.e., if
>>> a filesystem does not support sparse files, then a compound with
>>> {SEEK 0 NFS4_CONTENT_HOLE;} would return a result of {SEEK 1 X;}
>>> where 'X' was the size of the file.
>>>
>>> Sa_offset is not past the end of the file, but there is no more DATA, so a seek
>> DATA from 0x70000 (original file) should return sr_eof TRUE.
>>>
>>> In either RFC or lseek(2), a seek HOLE for 0x70000 will return 0x70000.
>>>
>>> It certainly makes sense that you should be able to have a hole at the end of a
>> file (pre-allocated disk blocks but no data written yet), and is in fact what
>> fallocate(2) will do.
>>>
>>> An NFS server could check the filesize and if sa_offset is < filesize and a
>> SEEK_DATA returns ENXIO, it could translate that to NFS4_OK and set sr_eof to
>> TRUE.
>>>
>>> The Ganesha code in FSAL_GLUSTER I believe is wrong. It changes any ENXIO
>> result to NFS4_OK with sr_eof TRUE. It would be better for it to do the simple
>> thing knfsd does of always passing along the ENXIO (this may be best if it is not
>> possible to safely verify sa_offset really is < filesize).
>>
>> Do you mean modifying ganesha/gluster as knfsd does?
>>
>> seek->seek_pos = vfs_llseek(file, seek->seek_offset, whence);
>> if (seek->seek_pos < 0)
>> status = nfserrno(seek->seek_pos);
>> else if (seek->seek_pos >= i_size_read(file_inode(file)))
>> seek->seek_eof = true;
>>
>> It is a working implementation, but not according to RFC description,
>>
>> If the server can not find a corresponding sa_what,
>> then the status will still be NFS4_OK, but sr_eof would be TRUE.
>>
>> As in this example, there is a HOLE at the end of the file, SEEK(in hole,
>> SEEK_DATA) should return NFS4_OK and sr_eof is TRUE, but knfsd return
>> NFS4ERR_NXIO.
>
> FSAL_GLUSTER always translates lseek return of ENXIO to NFS4_Ok with sr_eod TRUE.
>
> It should at least ONLY do that if sa_offset is < filesize (which would then be correct per RFC).
>
> Knfsd, to my understanding, looks like it always just returns ENXIO (which isn't exactly per RFC, but at least doesn't confuse the client and application as badly).
>
Copy that.
I will push a fix as knfsd.
thanks,
Kinglong Mee
next prev parent reply other threads:[~2018-09-13 5:10 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-11 12:29 lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK Kinglong Mee
2018-09-11 12:57 ` Trond Myklebust
2018-09-11 14:47 ` Kinglong Mee
2018-09-11 15:43 ` Anna Schumaker
2018-09-11 23:20 ` [NFS-Ganesha-Devel] " Frank Filz
2018-09-12 1:31 ` Kinglong Mee
2018-09-12 11:58 ` Frank Filz
2018-09-13 0:03 ` Kinglong Mee [this message]
2020-09-14 15:02 ` Frank Filz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acda682b-fdc1-3e45-baef-03142bafb2f8@gmail.com \
--to=kinglongmee@gmail.com \
--cc=devel@lists.nfs-ganesha.org \
--cc=ffilzlnx@mindspring.com \
--cc=linux-nfs@vger.kernel.org \
--cc=schumaker.anna@gmail.com \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).