Re: [GIT PULL] Orangefs (text only resend)

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Al Viro <viro@ZenIV.linux.org.uk>
To: Christoph Hellwig <hch@lst.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Mike Marshall <hubcap@omnibond.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Boaz Harrosh <ooo@electrozaur.com>,
	Greg Kroah-Hartman <greg@kroah.com>
Subject: Re: [GIT PULL] Orangefs (text only resend)
Date: Sun, 6 Sep 2015 10:08:38 +0100	[thread overview]
Message-ID: <20150906090838.GI22011@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20150906063552.GA7224@lst.de>

On Sun, Sep 06, 2015 at 08:35:52AM +0200, Christoph Hellwig wrote:

> On Wed, Sep 02, 2015 at 04:34:41PM -0700, Linus Torvalds wrote:
> >    (a) the iovecs are walked manually (eg
> > pvfs_bufmap_copy_to_user_iovec()). I would really want to see the code
> > use the iov_iter infrastructure
> 
> And that was before we had the full blown iov_iter code.  Note that
> orangefs always does O_DIRECT-style I/O and doesn't go through the
> page cache or does any other similar client side caching except for mmap,
> so it will only use the low-level iov_iter helpers.

FWIW, the thing that looks rather wrong there is playing fast and loose with
kvec -> iovec -> kvec casts.  This kind of thing
+               /* Are we copying to User Virtual Addresses? */
+               if (to_user)
+                       ret = pvfs_bufmap_copy_to_user_iovec(
+                               bufmap,
+                               buffer_index,
+                               vec,
+                               nr_segs,
+                               total_size);
+               /* Are we copying to Kern Virtual Addresses? */
+               else
+                       ret = pvfs_bufmap_copy_to_kernel_iovec(
+                               bufmap,
+                               buffer_index,
+                               vec,
+                               nr_segs,
+                               total_size);
is almost always a sign of really bad API.  "is that a kernel one?"
kind of flags is a bloody bad idea.  So's playing wiht copying iovecs,
modifying those copies, etc.

We might end up with new primitives added out of that, but this kind of
stuff definitely shouldn't be open-coded, especially that way.

Some random notes:

Could we please put
#define PVFS_util_min(x1, x2) (((x1) > (x2)) ? (x2) : (x1))
out of its misery?  It's pretty much the textbook example of the evils
of reinventing the wheels - side effects, arithmetic promotions, etc.

Another thing: why does server want to know about LOOKUP_FOLLOW, of all
things?  Unless I'm misreading what ->lookup() is doing there...

This
+ * pvfs2_link() is only implemented here to make sure that we return a
+ * reasonable error code (the kernel will return a misleading EPERM
+ * otherwise).  PVFS2 does not support hard links.
+ */
+static int pvfs2_link(struct dentry *old_dentry,
+                     struct inode *dir,
+                     struct dentry *dentry)
+{
+       return -EOPNOTSUPP;
+}
is stupid.  Expected error value is not EOPNOTSUPP; pardon the bluntness,
but your idea of what would be less misleading doesn't matter - what matters
is what the _callers_ of link(2), mknod(2), etc. are expecting.  Which is to
say, what does the userland code expect to get.  It's outright promised in
POSIX, actually.

symlink(2) - what happens to too long symlink bodies?  Silent truncation?

+static inline void PVFS_khandle_to(const struct pvfs2_khandle *kh,
+                                  void *p, int size)
+{
+       int i;
+       unsigned char *c = p;
+
+       memset(p, 0, size);
+
+       for (i = 0; i < 16 && i < size; i++)
+               c[i] = kh->u[i];
+}

Er...  If you are using memset(), why open-code memcpy()?  ->u is an array
of unsigned char, so this loop is just a straight memcpy(); nothing fancy
going on there...

May I politely inquire about the reasons for DECLARE_ERRNO_MAPPING_AND_FN 
being in a header?  Or existing at all, while we are at it...  Ditto for
the bit under the tail of that animal - DECLARE_ERRNO_MAPPING, that is.

mask_blocked_signals()/unmask_blocked_signals(): Oleg might want to take
a look at that one (and parties responsible might want to dive for cover,
especially when he figures out what hides behind pvfs2_current_sigaction
and pvfs2_current_signal_lock).

+       /* fill in temporary structure passed to fill_sb method */
+       mount_sb_info.data = data;
+       mount_sb_info.root_khandle =
+               new_op->downcall.resp.fs_mount.root_khandle;
+       mount_sb_info.fs_id = new_op->downcall.resp.fs_mount.fs_id;
+       mount_sb_info.id = new_op->downcall.resp.fs_mount.id;
+
+       /*
+        * the mount_sb_info structure looks odd, but it's used because
+        * the private sb info isn't allocated until we call
+        * pvfs2_fill_sb, yet we have the info we need to fill it with
+        * here.  so we store it temporarily and pass all of the info
+        * to fill_sb where it's properly copied out
+        */
+       mnt_sb_d = mount_nodev(fst,
+                              flags,
+                              (void *)&mount_sb_info,
+                              pvfs2_fill_sb);

If you are fighting an interface, that might be because you are using the
wrong one...  Consider the definition of mount_nodev():
struct dentry *mount_nodev(struct file_system_type *fs_type,
        int flags, void *data,
        int (*fill_super)(struct super_block *, void *, int))
{
        int error;
        struct super_block *s = sget(fs_type, NULL, set_anon_super, flags, NULL);

        if (IS_ERR(s))  
                return ERR_CAST(s);

        error = fill_super(s, data, flags & MS_SILENT ? 1 : 0);
        if (error) {
                deactivate_locked_super(s);
                return ERR_PTR(error);
        }
        s->s_flags |= MS_ACTIVE;
        return dget(s->s_root);
}
and observe that everything it calls is exported.  It's a trivial convenience
wrapper for sget(), and if it turns out to be inconvenient - just use sget()
itself and be done with that.  No need to bother with callbacks, having
that mount_sb_info thing, etc.

Is pvfs2_remount() safe to call during pvfs2_unmount_sb() on the same
filesystem?  Incidentally, what's protecting the list of superblocks
while you are walking it and calling pvfs2_remount()?

While we are at it, a lot of GFP_KERNEL allocations seem to be done
with a bunch of locks (including global ones, such as request_mutex) held.
Why won't that deadlock?

+#define pvfs2_lock_inode(inode) spin_lock(&inode->i_lock)
+#define pvfs2_unlock_inode(inode) spin_unlock(&inode->i_lock)

Inhume with extreme prejudice.  ->i_bytes and ->i_blocks are the least
of your worries if two threads hit copy_attributes_to_inode() on the
same inode in parallel.  Which can happen, AFAICS...

+       /*
+        * PVFS2 cannot set size with a setattr operation.  Probably not likely
+        * to be requested through the VFS, but just in case, don't worry about
+        * ATTR_SIZE
+        */
What about truncate(2)?

+       sb->s_fs_info =
+               kmalloc(sizeof(struct pvfs2_sb_info_s), PVFS2_GFP_FLAGS);
+       if (!PVFS2_SB(sb))
+               return -ENOMEM;
+       memset(sb->s_fs_info, 0, sizeof(struct pvfs2_sb_info_s));
+       PVFS2_SB(sb)->sb = sb;
kzalloc()?

+       root_dentry = d_make_root(root);
+       if (!root_dentry) {
+               iput(root);
+               return -ENOMEM;
Double iput() here...

Anyway, bedtime for me - it's 5am here.  I'll post more detailed one after
I get some sleep...

next prev parent reply	other threads:[~2015-09-06  9:08 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-01 15:42 [GIT PULL] Orangefs (text only resend) Mike Marshall
2015-09-02 23:34 ` Linus Torvalds
2015-09-03  1:13   ` Mike Marshall
2015-09-03  1:28     ` Linus Torvalds
2015-09-03 20:22       ` Linus Torvalds
2015-09-03 22:18         ` Mike Marshall
2015-09-03 22:44         ` Greg Kroah-Hartman
2015-09-06  6:35   ` Christoph Hellwig
2015-09-06  9:08     ` Al Viro [this message]
2015-09-06 14:52       ` Mike Marshall
2015-09-06 15:00         ` Mike Marshall
2015-09-06 20:20         ` Al Viro
2015-09-07  6:37           ` Al Viro
2015-09-07 21:10             ` Mike Marshall
2015-09-07 23:22               ` Al Viro
2015-09-08  0:47                 ` Theodore Ts'o
2015-09-08  2:49                   ` Dave Chinner
2015-09-08 14:48                 ` Christoph Hellwig
2015-09-08 18:21                   ` Mike Marshall
2015-09-09 15:05                     ` Mike Marshall
2015-09-11 21:12                       ` Mike Marshall
2015-09-11 22:24                         ` Al Viro
2015-09-13 11:56                           ` Mike Marshall
2015-09-11 23:20                         ` Linus Torvalds
2015-09-13 11:59                           ` Mike Marshall
2015-09-06 14:35     ` Mike Marshall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150906090838.GI22011@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=akpm@linux-foundation.org \
    --cc=greg@kroah.com \
    --cc=hch@lst.de \
    --cc=hubcap@omnibond.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=ooo@electrozaur.com \
    --cc=sfr@canb.auug.org.au \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).