linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Mike Marshall <hubcap@omnibond.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: Orangefs, v4.5 and the merge window...
Date: Fri, 11 Mar 2016 21:47:45 +0000	[thread overview]
Message-ID: <20160311214745.GT17997@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CAOg9mSSzJZiv=qBTuLWaEF+FJcVA=UhXtxFPwxiYG8mkn-Rj7Q@mail.gmail.com>

On Fri, Mar 11, 2016 at 03:18:57PM -0500, Mike Marshall wrote:
> Greetings...
> 
> The Orangefs for-next tree is:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux.git
> for-next
> 
> I did a test merge (just locally, not pushed out) of Orangefs:for-next
> and v4.5-rc7 so I could test out how I think I need to patch for
> the follow_link -> get_link change, the diff is below.
> 
> On Monday next, assuming that v4.5 is finalized this weekend,
> I plan to do a actual merge with v4.5, apply the get_link patch
> and push that to Orangefs:for-next.
> 
> Hi Al <g>... might we get an ACK this time around?

You do realize that it will mean fun few weeks post-merge fixing the rest of
problems, right?  FWIW, I think that right now it *is* at the state where it
such fixing is feasible, so modulo that...

As far as I can see, waiting-related logics should be solid by now, ditto
for lifetime rules; sanitizing the input...  listxattr still does need fixing
(feed it a negative in ->downcall.resp.listxattr.lengths[0] and watch Bad
Things(tm) happen; no idea why would anyone go for
fs/orangefs/downcall.h:82:      __s32 lengths[ORANGEFS_MAX_XATTR_LISTLEN];
for representing string lengths in the first place, but that's what you've
got there and no sanity checks are done on it beyond
                if (total + new_op->downcall.resp.listxattr.lengths[i] > size)
                        goto done;
which is not enough - not with total and size being ssize_t and ...lengths[] -
signed 32bit).

The logics around maintaining the list of orangefs superblocks (add/remove/
traverse) needs fixing; right now ioctl(..., ORANGEFS_DEV_REMOUNT_ALL) will
walk through it with only request_mutex held.  Both insertion and removal
are protected only by orangefs_superblocks_lock, and removal is insane -
        struct list_head *tmp_safe = NULL;                              \
        struct orangefs_sb_info_s *orangefs_sb = NULL;                  \
                                                                        \
        spin_lock(&orangefs_superblocks_lock);                          \
        list_for_each_safe(tmp, tmp_safe, &orangefs_superblocks) {      \
                orangefs_sb = list_entry(tmp,                           \
                                      struct orangefs_sb_info_s,        \
                                      list);                            \
                if (orangefs_sb && (orangefs_sb->sb == sb)) {           \
                        gossip_debug(GOSSIP_SUPER_DEBUG,                \
                            "Removing SB %p from orangefs superblocks\n",      \
                        orangefs_sb);                                   \
                        list_del(&orangefs_sb->list);                   \
                        break;                                          \
                }                                                       \
        }                                                               \
        spin_unlock(&orangefs_superblocks_lock);                        \
list_entry is never NULL, for starters, and since there is a pointer back
from superblock to that orangefs_sb_info, there's no reason to walk the entire
list to find one.  BTW, both add_orangefs_sb() and remove_orangefs_sb() should
be taken to their sole users.

Sanity aside, there's really no lock in common for list modifiers and list
walker I'd mentioned above.  FWIW, I would make orangefs_remount()
take struct orangefs_sb_info instead of struct super_block and flipped the
order of operations in orangefs_kill_sb() - kill_anon_super() *first*, then
remove from the list, then tell the userland that it's going away (i.e.
call orangefs_unmount_sb()).  request_mutex in the last one would, at least,
prevent freeing the sucker before orangefs_remount() is done with it.

Walking the list and calling orangefs_remount() on everything would still need
care - you'd need to hold orangefs_superblocks_lock, drop it for actual calls
of orangefs_remount() and have list removal preserve the forward pointer.

That's probably the worst remaining locking issue I see in there.  Doable,
if not pleasant...

IIRC, there also had been some unpleasantness with getattr messing ->i_mode
halfway through... <checks>  Yes - copy_attributes_to_inode() will be called,
and do
        inode->i_mode = orangefs_inode_perms(attrs);
...
                inode->i_mode |= S_IFLNK;
...
                        strncpy(orangefs_inode->link_target,
                                symname,
                                ORANGEFS_NAME_MAX);
If nothing else, *another* stat(2) racing with this one could pick the
intermediate value of ->i_mode and proceed to report it to userland.
Another problem is overwriting the symlink body; that can get very
unpleasant, since it might be traversed by another syscall right at that
moment.  Any change of a symlink body means "we'd missed it going stale"; 
there is no way to change a symlink contents without removing it and
creating a new one.  Should anything other than orangefs_iget() even bother
copying it?  The same goes for inode type changes, of course (regular
vs. directory vs. symlink, etc.).

Speaking of orangefs_iget(), orangefs_set_inode() is pointlessly paranoid.
Not a bug per se, but
        struct orangefs_inode_s *orangefs_inode = NULL;

        /* Make sure that we have sane parameters */
        if (!data || !inode)  
                return 0;
        orangefs_inode = ORANGEFS_I(inode);
        if (!orangefs_inode)
                return 0;
is all wrong - 'data' is the last argument passed to iget5_locked (i.e. 'ref'
of orangefs_iget()) and that's always an address of either a local variable
or of a field in a large structure, and not even the first one; 'inode'
is never NULL - it's the address of struct inode the caller is about to
insert into the hash chain; ORANGEFS_I() is container_of(), so it's not
going to be NULL either.

I'll need to look through the archived threads to see if there's anything
else left; IIRC, debugfs-related code had seriously nasty issues in case of
allocation failures, but those were fairly isolated.  I'll read through the
archive tomorrow and see if there's anything else mentioned and not dealt
with; I don't remember anything really bad, but it had been well over
a hundred mails starting about half a year ago; I sure as hell do not
remember every tangential subthread in all of that, so I'll need to recheck.

I _think_ that all remaining issues can be quickly dealt with, and the code
has zero impact on the rest of the kernel.  I wouldn't risk putting it into
-final without fixups, but as for the merge schedule... either merge it
before -rc1 and fix it up by -rc3 or so, or fix it during the window and
merge at around -rc2 - I'm fine with either variant.

  reply	other threads:[~2016-03-11 21:47 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-11 20:18 Orangefs, v4.5 and the merge window Mike Marshall
2016-03-11 21:47 ` Al Viro [this message]
2016-03-11 22:35   ` Mike Marshall
2016-03-14 21:03     ` Mike Marshall
2016-03-26  0:21       ` Al Viro
2016-03-26  1:00         ` Mike Marshall
     [not found]         ` <CA+55aFzLC_pdj_ds82YYab5D7jpYMj26s0Frofxxhk=j7SqnjA@mail.gmail.com>
2016-03-26  1:01           ` Al Viro
2016-03-26  1:07             ` Mike Marshall
     [not found]             ` <CA+55aFysWS9mP+QgfAR6LZpEbkp61MUPQu0zDoq7cafmr3M8SA@mail.gmail.com>
2016-03-26  3:55               ` Mike Marshall
2016-03-26  4:30                 ` Al Viro
2016-03-26 12:07                   ` Mike Marshall
2016-03-26 14:47                     ` Al Viro
2016-03-26 15:34                       ` Mike Marshall
2016-03-26 15:50                         ` Al Viro
2016-03-26 17:36                           ` Mike Marshall
2016-03-26 18:28                             ` Al Viro
2016-03-26 18:37                               ` Al Viro
2016-03-26 19:00                                 ` Mike Marshall
2016-03-26 19:51                                   ` Linus Torvalds
2016-03-26 20:47                                     ` Mike Marshall
2016-03-26 21:00                                       ` Linus Torvalds
2016-03-26  1:02           ` Mike Marshall
2016-03-15  4:04   ` Martin Brandenburg
2016-03-15 16:45     ` Martin Brandenburg
2016-03-17 20:45       ` [PATCH] orangefs: getattr work (was: Re: Orangefs, v4.5 and the merge window...) Martin Brandenburg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160311214745.GT17997@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=hubcap@omnibond.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).