linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Frank Filz" <ffilzlnx@mindspring.com>
To: "'Kinglong Mee'" <kinglongmee@gmail.com>,
	"'Anna Schumaker'" <schumaker.anna@gmail.com>,
	"'Trond Myklebust'" <trondmy@hammerspace.com>,
	<devel@lists.nfs-ganesha.org>, <linux-nfs@vger.kernel.org>
Subject: RE: [NFS-Ganesha-Devel] Re: lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK
Date: Wed, 12 Sep 2018 04:58:31 -0700	[thread overview]
Message-ID: <012901d44a8f$ed7c6bc0$c8754340$@mindspring.com> (raw)
In-Reply-To: <f096aead-c6d0-2768-8cf4-ebdf0f89c027@gmail.com>

> On 2018/9/12 7:20, Frank Filz wrote:
> >> On Tue, 2018-09-11 at 22:47 +0800, Kinglong Mee wrote:
> >>> On 2018/9/11 20:57, Trond Myklebust wrote:
> >>>> On Tue, 2018-09-11 at 20:29 +0800, Kinglong Mee wrote:
> >>>>> The latest ganesha/gluster supports seek according to,
> >>>>>
> >>>>>
> >>>>
> >>>> =
https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#secti
> >>>> o
> >>>> n-15.11
> >>>>>
> >>>>>    From the given sa_offset, find the next data_content4 of type
> >>>>> sa_what
> >>>>>    in the file.  If the server can not find a corresponding =
sa_what,
> >>>>>    then the status will still be NFS4_OK, but sr_eof would be
> >>>>> TRUE.  If
> >>>>>    the server can find the sa_what, then the sr_offset is the
> >>>>> start of
> >>>>>    that content.  If the sa_offset is beyond the end of the =
file,
> >>>>> then
> >>>>>    SEEK MUST return NFS4ERR_NXIO.
> >>>>>
> >>>>> For a file's filemap as,
> >>>>>
> >>>>> Part    1: HOLE 0x0000000000000000 ---> 0x0000000000600000
> >>>>> Part    2: DATA 0x0000000000600000 ---> 0x0000000000700000
> >>>>> Part    3: HOLE 0x0000000000700000 ---> 0x0000000001000000>>
> >>>>> SEEK(0x700000, SEEK_DATA) gets result (sr_eof:1,
> >>>>> sr_offset:0x70000) from ganesha/gluster; SEEK(0x700000, =
SEEK_HOLE)
> >>>>> gets result (sr_eof:0, sr_offset:0x70000) from ganesha/gluster.
> >>>>>
> >>>>> If an application depends the lseek result for data searching, =
it
> >>>>> may enter infinite loop.
> >>>>>
> >>>>>         while (1) {
> >>>>>                 next_pos =3D lseek(fd, cur_pos, seek_type);
> >>>>>                 if (seek_type =3D=3D SEEK_DATA) {
> >>>>>                         seek_type =3D SEEK_HOLE;
> >>>>>                 } else {
> >>>>>                         seek_type =3D SEEK_DATA;
> >>>>>                 }
> >>>>>
> >>>>>                 if (next_pos =3D=3D -1) {
> >>>>>                         return ;
> >>>>>
> >>>>>                 cur_pos =3D next_pos;
> >>>>>         }
> >>>>>
> >>>>> The lseek syscall always gets 0x70000 from nfs client for those
> >>>>> two cases, but, if underlying filesystem is ext4/f2fs, or the =
nfs
> >>>>> server is knfsd, the lseek(0x700000, SEEK_DATA) gets ENXIO.
> >>>>>
> >>>>> I wanna to know,
> >>>>> should I fix the ganesha/gluster as knfsd return ENXIO for the
> >>>>> first case?
> >>>>> or should I fix the nfs client to return ENXIO for the first =
case?
> >>>>>
> >>>>
> >>>> It that test correct? The fallback implementation of SEEK_DATA
> >>>> assumes that the entire file is data, so lseek(SEEK_DATA) on any
> >>>> offset that is <=3D eof will be a no-op. The fallback =
implementation
> >>>> of SEEK_HOLE assumes that the first hole is at eof.
> >>>
> >>> I think that's non-nfsv4.2's logical.
> >>>
> >>>>
> >>>> IOW: unless the initial value for cur_pos is > eof, it looks to =
me
> >>>> as if the above test will loop infinitely given any filesystem =
that
> >>>> doesn't implement native support for SEEK_DATA/SEEK_HOLE.
> >>>>
> >>>
> >>> No, if underlying filesystem is ext4/f2fs, or the nfs server is
> >>> knfsd, the last lseek syscall always return ENXIO no matter the
> >>> cur_pos is > eof or not.
> >>>
> >>> A file ends with a hole as,
> >>> Part   22: DATA 0x0000000006a00000 ---> 0x0000000006afffff
> >>> Part   23: HOLE 0x0000000006b00000 ---> 0x000000000c7fffff
> >>>
> >>> # stat testfile
> >>>   File: testfile
> >>>   Size: 209715200       Blocks: 22640      IO Block: 4096   =
regular file
> >>> Device: 807h/2055d      Inode: 843122      Links: 2
> >>> Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    =
root)
> >>> Access: 2018-09-11 20:01:41.881227061 +0800
> >>> Modify: 2018-09-11 20:01:41.976478311 +0800
> >>> Change: 2018-09-11 20:01:41.976478311 +0800
> >>>  Birth: -
> >>>
> >>> # strace filemap testfile
> >>> ... ...
> >>> lseek(3, 111149056, SEEK_HOLE)          =3D 112197632
> >>> lseek(3, 112197632, SEEK_DATA)          =3D -1 ENXIO (No such =
device or
> address)
> >>>
> >>> Right now, when knfsd gets the ENXIO error, it returns the error =
to
> >>> client directly, and return to syscall.
> >>> But, ganesha set the sr_eof to true and return NFS4_OK to client =
as
> >>> RFC description, nfs client skips the sr_eof and return a bad =
offset
> >>> to syscall.
> >>
> >> Would it make more sense to change Knfsd instead of the client?  I
> >> think I was trying to keep things simple when I wrote the code, so =
I
> >> just passed the result of the lseek system call back to the client.
> >
> > Looking at the lseek(2) man page, it's not clear to me what should =
be returned
> if as in this example, there is a HOLE at the end of the file (i.e. =
filesize is larger
> than the range of the last DATA in the file). It sounds like ext4 =
returns ENXIO if a
> SEEK_DATA is done past the last data in the file.
> >
> >        SEEK_DATA
> >               Adjust the file offset to the next location in the =
file greater than or
> equal to offset containing data.  If offset points to data, then the =
file offset  is
> set
> >               to offset.
> >
> >        SEEK_HOLE
> >               Adjust  the file offset to the next hole in the file =
greater than or equal
> to offset.  If offset points into the middle of a hole, then the file =
offset is set to
> >               offset.  If there is no hole past offset, then the =
file offset is adjusted to
> the end of the file (i.e., there is an implicit hole at the end of any =
file).
> >
> >        In both of the above cases, lseek() fails if offset points =
past the end of the
> file.
> >
> >        These operations allow applications to map holes in a =
sparsely allocated
> file.  This can be useful for applications such as file backup tools, =
which can save
> space when
> >        creating backups and preserve holes, if they have a mechanism =
for
> discovering holes.
> >
> >        For the purposes of these operations, a hole is a sequence of =
zeros that
> (normally) has not been allocated in the underlying file storage.  =
However, a
> filesystem is not
> >        obliged to report holes, so these operations are not a =
guaranteed
> mechanism for mapping the storage space actually allocated to a file.
> (Furthermore,  a  sequence  of
> >        zeros  that actually has been written to the underlying =
storage may not be
> reported as a hole.)  In the simplest implementation, a filesystem can =
support
> the operations
> >        by making SEEK_HOLE always return the offset of the end of =
the file, and
> making SEEK_DATA always return offset (i.e., even if the location =
referred to by
> offset  is  a
> >        hole, it can be considered to consist of data that is a =
sequence of zeros).
> >
> > The RFC text is pretty clear:
> >
> >    SEEK is an operation that allows a client to determine the =
location
> >    of the next data_content4 in a file.  It allows an implementation =
of
> >    the emerging extension to lseek(2) to allow clients to determine =
the
> >    next hole whilst in data or the next data whilst in a hole.
> >
> >    From the given sa_offset, find the next data_content4 of type =
sa_what
> >    in the file.  If the server can not find a corresponding sa_what,
> >    then the status will still be NFS4_OK, but sr_eof would be TRUE.  =
If
> >    the server can find the sa_what, then the sr_offset is the start =
of
> >    that content.  If the sa_offset is beyond the end of the file, =
then
> >    SEEK MUST return NFS4ERR_NXIO.
> >
> >    All files MUST have a virtual hole at the end of the file.  I.e., =
if
> >    a filesystem does not support sparse files, then a compound with
> >    {SEEK 0 NFS4_CONTENT_HOLE;} would return a result of {SEEK 1 X;}
> >    where 'X' was the size of the file.
> >
> > Sa_offset is not past the end of the file, but there is no more =
DATA, so a seek
> DATA from 0x70000 (original file) should return sr_eof TRUE.
> >
> > In either RFC or lseek(2), a seek HOLE for 0x70000 will return =
0x70000.
> >
> > It certainly makes sense that you should be able to have a hole at =
the end of a
> file (pre-allocated disk blocks but no data written yet), and is in =
fact what
> fallocate(2) will do.
> >
> > An NFS server could check the filesize and if sa_offset is < =
filesize and a
> SEEK_DATA returns ENXIO, it could translate that to NFS4_OK and set =
sr_eof to
> TRUE.
> >
> > The Ganesha code in FSAL_GLUSTER I believe is wrong. It changes any =
ENXIO
> result to NFS4_OK with sr_eof TRUE. It would be better for it to do =
the simple
> thing knfsd does of always passing along the ENXIO (this may be best =
if it is not
> possible to safely verify sa_offset really is < filesize).
>=20
> Do you mean modifying ganesha/gluster as knfsd does?
>=20
>         seek->seek_pos =3D vfs_llseek(file, seek->seek_offset, =
whence);
>         if (seek->seek_pos < 0)
>                 status =3D nfserrno(seek->seek_pos);
>         else if (seek->seek_pos >=3D i_size_read(file_inode(file)))
>                 seek->seek_eof =3D true;
>=20
> It is a working implementation, but not according to RFC description,
>=20
>    If the server can not find a corresponding sa_what,
>    then the status will still be NFS4_OK, but sr_eof would be TRUE.
>=20
> As in this example, there is a HOLE at the end of the file, SEEK(in =
hole,
> SEEK_DATA) should return NFS4_OK and sr_eof is TRUE, but knfsd return
> NFS4ERR_NXIO.

FSAL_GLUSTER always translates lseek return of ENXIO to NFS4_Ok with =
sr_eod TRUE.

It should at least ONLY do that if sa_offset is < filesize (which would =
then be correct per RFC).

Knfsd, to my understanding, looks like it always just returns ENXIO =
(which isn't exactly per RFC, but at least doesn't confuse the client =
and application as badly).

Frank

  reply	other threads:[~2018-09-12 17:02 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-11 12:29 lseek gets bad offset from nfs client with ganesha/gluster which supports SEEK Kinglong Mee
2018-09-11 12:57 ` Trond Myklebust
2018-09-11 14:47   ` Kinglong Mee
2018-09-11 15:43     ` Anna Schumaker
2018-09-11 23:20       ` [NFS-Ganesha-Devel] " Frank Filz
2018-09-12  1:31         ` Kinglong Mee
2018-09-12 11:58           ` Frank Filz [this message]
2018-09-13  0:03             ` Kinglong Mee
2020-09-14 15:02               ` Frank Filz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='012901d44a8f$ed7c6bc0$c8754340$@mindspring.com' \
    --to=ffilzlnx@mindspring.com \
    --cc=devel@lists.nfs-ganesha.org \
    --cc=kinglongmee@gmail.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=schumaker.anna@gmail.com \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).