Re: [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS.

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	Anna Schumaker <Anna.Schumaker@netapp.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	lkml <linux-kernel@vger.kernel.org>,
	"linux-nfs\@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Lennart Poettering <lennart@poettering.net>,
	Pavel Emelyanov <xemul@virtuozzo.com>, Jan Kara <jack@suse.cz>
Subject: Re: [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS.
Date: Fri, 08 Dec 2017 13:17:31 +1100	[thread overview]
Message-ID: <878teeq7yc.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <CAOQ4uxgg0cugypouB=Wyk1Uevm9EyqxpsHPaVqy0LxgS-bNGKw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5019 bytes --]

On Thu, Dec 07 2017, Amir Goldstein wrote:

> On Thu, Dec 7, 2017 at 5:20 AM, NeilBrown <neilb@suse.com> wrote:
>> On Wed, Dec 06 2017, Linus Torvalds wrote:
>>
>>> On Thu, Nov 30, 2017 at 12:56 PM, NeilBrown <neilb@suse.com> wrote:
>>>>
>>>> -/* limit the handle size to NFSv4 handle size now */
>>>> -#define MAX_HANDLE_SZ 128
>>>> +/* Must be larger than NFSv4 file handle, but small
>>>> + * enough for an on-stack allocation. overlayfs doesn't
>>>> + * want this too close to 255.
>>>> + */
>>>> +#define MAX_HANDLE_SZ 200
>>>
>>> This really smells for so many reasons.
>>>
>>> Also, that really is starting to be a fairly big stack allocation, and
>>> it seems to be used in exactly one place (show_mark_fhandle), which
>>> makes me go "why is that on the stack anyway?".
>>>
>>> Could we just allocate a buffer at open time or something?
>>>
>>>                Linus
>>
>> "open time" would be when /proc/X/fdinfo/Y was opened in
>> seq_fdinfo_open(), and allocating a file_handle there seems a bit odd.
>>
>> We can allocate in fs/notify/fdinfo.c:show_fdinfo() which is
>> the earliest 'notify' specific code to run.  There is no
>> opportunity to return an error but GFP_KERNEL allocations under 1 page
>> never fail..
>>
>> This patch allocates a single buffer for all inodes reported for a given
>> inotify fdinfo, and if the allocation files, the filehandle is silently
>> left blank.  More surgery would be needed to be able to return an error.
>>
>> Is that at all suitable?
>>
>> Thanks,
>> NeilBrown
>>
>> From: NeilBrown <neilb@suse.com>
>> Subject: fs/notify: don't put file handle buffer on stack.
>>
>> A file handle buffer is not tiny, and could need to be larger in future,
>> so it isn't safe to allocate one on the stack.  Instead, we need to
>> kmalloc().
>>
>> There is no way to return an error status from a ->show_fdinfo()
>> function, so if the kmalloc fails, we silently exclude the filehandle
>> from the output.  As it is at the end of line, this shouldn't
>> upset parsing too much.
>
> It shouldn't upset parsing because that would be the same out
> output as without CONFIG_EXPORTFS. AFAIK this information
> is used by CRUI.
>
>>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>>
>> diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
>> index d478629c728b..20d863b9ae16 100644
>> --- a/fs/notify/fdinfo.c
>> +++ b/fs/notify/fdinfo.c
>> @@ -23,56 +23,58 @@
>>
>>  static void show_fdinfo(struct seq_file *m, struct file *f,
>>                         void (*show)(struct seq_file *m,
>> -                                    struct fsnotify_mark *mark))
>> +                                    struct fsnotify_mark *mark,
>> +                                    struct fid *fh))
>>  {
>>         struct fsnotify_group *group = f->private_data;
>>         struct fsnotify_mark *mark;
>> +       struct fid *fh = kmalloc(MAX_HANDLE_SZ, GFP_KERNEL);
>>
>>         mutex_lock(&group->mark_mutex);
>>         list_for_each_entry(mark, &group->marks_list, g_list) {
>> -               show(m, mark);
>> +               show(m, mark, fh);
>>                 if (seq_has_overflowed(m))
>>                         break;
>>         }
>>         mutex_unlock(&group->mark_mutex);
>> +       kfree(fh);
>>  }
>>
>>  #if defined(CONFIG_EXPORTFS)
>> -static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
>> +static void show_mark_fhandle(struct seq_file *m, struct inode *inode,
>> +                             struct fid *fhbuf)
>>  {
>> -       struct {
>> -               struct file_handle handle;
>> -               u8 pad[MAX_HANDLE_SZ];
>> -       } f;
>>         int size, ret, i;
>> +       unsigned char *bytes;
>>
>> -       f.handle.handle_bytes = sizeof(f.pad);
>> -       size = f.handle.handle_bytes >> 2;
>> +       if (!fhbuf)
>> +               return;
>> +       size = MAX_HANDLE_SZ >> 2;
>>
>> -       ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, 0);
>> +       ret = exportfs_encode_inode_fh(inode, fhbuf, &size, 0);
>>         if ((ret == FILEID_INVALID) || (ret < 0)) {
>>                 WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);
>
> This WARN_ONCE is out of order. It is perfectly valid for inotify/fanotify
> to watch over fs that doesn't support exportfs. Care to clean it up?
> Perhaps a pr_warn_ratelimited() for either !fhbuf or can't encode?

If I were going to clean it up, I would need to do more than remove the
WARN_ONCE(), which almost certainly never fires.

exportfs_encode_inode_fh() should only be called if sb->s_export_op is
not NULL.
When it is NULL, it means that the filesystem doesn't support file
handles.
do_sys_name_to_handle() tests this, as does nfsd.  But this inotify code
doesn't.
So it can report a "file handle" for a file for which file handles
aren't supported.  It will use the default export_encode_fh which
reports i_ino and i_generation, which may or may not be stable or
meaningful.

So yes, this code does need a bit of cleaning up....

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

next prev parent reply	other threads:[~2017-12-08  2:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-30 20:56 [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS NeilBrown
2017-12-04  3:27 ` [PATCH] fhandle: avoid -EINVAL if requested size is too large NeilBrown
2017-12-06 19:05 ` [PATCH] NFS: allow name_to_handle_at() to work for Amazon EFS J. Bruce Fields
2017-12-07  2:07 ` Linus Torvalds
2017-12-07  3:20   ` NeilBrown
2017-12-07  4:04     ` Amir Goldstein
2017-12-08  2:17       ` NeilBrown [this message]
2017-12-19 12:42         ` Jan Kara
2017-12-20 21:23           ` NeilBrown
2017-12-07  5:34     ` Matthew Wilcox
  -- strict thread matches above, loose matches on Subject: below --
2017-11-30 20:56 NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878teeq7yc.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=Anna.Schumaker@netapp.com \
    --cc=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=jack@suse.cz \
    --cc=lennart@poettering.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=trond.myklebust@primarydata.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).