linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: /proc/<pid>/exe symlink behavior change in >=3.15.
       [not found] <540B8040.5010206@gmail.com>
@ 2014-09-07  7:56 ` Mateusz Guzik
  2014-09-11 18:37   ` Piotr Karbowski
  2014-09-11 23:39   ` Chuck Ebbert
  0 siblings, 2 replies; 5+ messages in thread
From: Mateusz Guzik @ 2014-09-07  7:56 UTC (permalink / raw)
  To: Piotr Karbowski, Al Viro; +Cc: linux-kernel, linux-fsdevel

On Sat, Sep 06, 2014 at 11:44:32PM +0200, Piotr Karbowski wrote:
> Hi,
> 
> Starting with kernel 3.15 the 'exe' symlink under /proc/<pid>/ acts diffrent
> than it used to in all the pre-3.15 kernels.
> 
> The usecase:
> 
> run /root/testbin (app that just sleeps)
> cp /root/testbin /root/testbin.new
> mv /root/testbin.new /root/testbin
> ls -al /proc/`pidof testbin`/exe
> 
> <=3.14: /root/testbin (deleted)
> >=3.15: /root/testbin.new (deleted)
> 
> Was the change intentional? It does render my system unusable and I failed
> to find a information about such change in the ChangeLog.
> 

It looks like this was already broken for "long" (> DNAME_INLINE_LEN)
names.

Short names share the problem since da1ce0670c14d8 "vfs: add
cross-rename".

The following change to switch_names is the culprit:

-       memcpy(dentry->d_iname, target->d_name.name,
-                   target->d_name.len + 1);
-       dentry->d_name.len = target->d_name.len;
-       return;
+       unsigned int i;
+       BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
+       for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
+     		  swap(((long *) &dentry->d_iname)[i],
+  			((long *) &target->d_iname)[i]);
+       }


Dentries can have names from embedded structure or from an external buffer.

If you take a look around you will see the code just swaps pointers for
"both external" case. But this results in the same behavoiur you are seeing.

Not sure how to fix it. Name in 'target' needs to be preserved, but memory
allocation which may be needed for this purpose can fail and switch_names
returns void, just like its callers (not to mention locks held around this).

One crap idea would be to have external buffers with a reference counter.
d_inode would still be set to the buffer and freeing funcs would use
container_of to get to the counter.

I can implement that later if it sounds sane enough.

Note this behaviour seems to be a requirement for cross-rename to work.

At least restoring previous behaviour while keeping cross-rename is not hard,
I can write it later.

-- 
Mateusz Guzik

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: /proc/<pid>/exe symlink behavior change in >=3.15.
  2014-09-07  7:56 ` /proc/<pid>/exe symlink behavior change in >=3.15 Mateusz Guzik
@ 2014-09-11 18:37   ` Piotr Karbowski
  2014-09-11 23:39   ` Chuck Ebbert
  1 sibling, 0 replies; 5+ messages in thread
From: Piotr Karbowski @ 2014-09-11 18:37 UTC (permalink / raw)
  To: Mateusz Guzik, Al Viro; +Cc: linux-kernel, linux-fsdevel

Hi Mateusz,

 > I can implement that later if it sounds sane enough.
> Note this behaviour seems to be a requirement for cross-rename to work.
>
> At least restoring previous behaviour while keeping cross-rename is not hard,
> I can write it later.

Have you find a proper solution for this very issue or anything that can 
push forward addressing it?

Thankfully this issue has been addressed by a workaround in fs/dcache.c 
in grsecurity patch for 3.16.2 
(grsecurity-3.0-3.16.2-201409082129.patch) but it would be great to have 
a fix upstream.

-- Piotr.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: /proc/<pid>/exe symlink behavior change in >=3.15.
  2014-09-07  7:56 ` /proc/<pid>/exe symlink behavior change in >=3.15 Mateusz Guzik
  2014-09-11 18:37   ` Piotr Karbowski
@ 2014-09-11 23:39   ` Chuck Ebbert
  2014-09-11 23:57     ` Mateusz Guzik
  1 sibling, 1 reply; 5+ messages in thread
From: Chuck Ebbert @ 2014-09-11 23:39 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Piotr Karbowski, Al Viro, linux-kernel, linux-fsdevel,
	Miklos Szeredi

On Sun, 7 Sep 2014 09:56:08 +0200
Mateusz Guzik <mguzik@redhat.com> wrote:

> On Sat, Sep 06, 2014 at 11:44:32PM +0200, Piotr Karbowski wrote:
> > Hi,
> > 
> > Starting with kernel 3.15 the 'exe' symlink under /proc/<pid>/ acts diffrent
> > than it used to in all the pre-3.15 kernels.
> > 
> > The usecase:
> > 
> > run /root/testbin (app that just sleeps)
> > cp /root/testbin /root/testbin.new
> > mv /root/testbin.new /root/testbin
> > ls -al /proc/`pidof testbin`/exe
> > 
> > <=3.14: /root/testbin (deleted)
> > >=3.15: /root/testbin.new (deleted)
> > 
> > Was the change intentional? It does render my system unusable and I failed
> > to find a information about such change in the ChangeLog.
> > 
> 
> It looks like this was already broken for "long" (> DNAME_INLINE_LEN)
> names.
> 
> Short names share the problem since da1ce0670c14d8 "vfs: add
> cross-rename".
> 
> The following change to switch_names is the culprit:
> 
> -       memcpy(dentry->d_iname, target->d_name.name,
> -                   target->d_name.len + 1);
> -       dentry->d_name.len = target->d_name.len;
> -       return;
> +       unsigned int i;
> +       BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
> +       for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
> +     		  swap(((long *) &dentry->d_iname)[i],
> +  			((long *) &target->d_iname)[i]);
> +       }
> 
> 
> Dentries can have names from embedded structure or from an external buffer.
> 
> If you take a look around you will see the code just swaps pointers for
> "both external" case. But this results in the same behavoiur you are seeing.
> 

Looks like the real problem here is that __d_materialise_dentry() needs the
old behavior of switch_names() . At least that's how it got fixed in grsecurity.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: /proc/<pid>/exe symlink behavior change in >=3.15.
  2014-09-11 23:39   ` Chuck Ebbert
@ 2014-09-11 23:57     ` Mateusz Guzik
  2014-09-15 14:14       ` Miklos Szeredi
  0 siblings, 1 reply; 5+ messages in thread
From: Mateusz Guzik @ 2014-09-11 23:57 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Piotr Karbowski, Al Viro, linux-kernel, linux-fsdevel,
	Miklos Szeredi

On Thu, Sep 11, 2014 at 06:39:58PM -0500, Chuck Ebbert wrote:
> On Sun, 7 Sep 2014 09:56:08 +0200
> Mateusz Guzik <mguzik@redhat.com> wrote:
> 
> > On Sat, Sep 06, 2014 at 11:44:32PM +0200, Piotr Karbowski wrote:
> > > Hi,
> > > 
> > > Starting with kernel 3.15 the 'exe' symlink under /proc/<pid>/ acts diffrent
> > > than it used to in all the pre-3.15 kernels.
> > > 
> > > The usecase:
> > > 
> > > run /root/testbin (app that just sleeps)
> > > cp /root/testbin /root/testbin.new
> > > mv /root/testbin.new /root/testbin
> > > ls -al /proc/`pidof testbin`/exe
> > > 
> > > <=3.14: /root/testbin (deleted)
> > > >=3.15: /root/testbin.new (deleted)
> > > 
> > > Was the change intentional? It does render my system unusable and I failed
> > > to find a information about such change in the ChangeLog.
> > > 
> > 
> > It looks like this was already broken for "long" (> DNAME_INLINE_LEN)
> > names.
> > 
> > Short names share the problem since da1ce0670c14d8 "vfs: add
> > cross-rename".
> > 
> > The following change to switch_names is the culprit:
> > 
> > -       memcpy(dentry->d_iname, target->d_name.name,
> > -                   target->d_name.len + 1);
> > -       dentry->d_name.len = target->d_name.len;
> > -       return;
> > +       unsigned int i;
> > +       BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
> > +       for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
> > +     		  swap(((long *) &dentry->d_iname)[i],
> > +  			((long *) &target->d_iname)[i]);
> > +       }
> > 
> > 
> > Dentries can have names from embedded structure or from an external buffer.
> > 
> > If you take a look around you will see the code just swaps pointers for
> > "both external" case. But this results in the same behavoiur you are seeing.
> > 
> 
> Looks like the real problem here is that __d_materialise_dentry() needs the
> old behavior of switch_names() . At least that's how it got fixed in grsecurity.

No.

Regression in question is an effect of swap instead of memcpy in
switch_names, as called by d_move. Fix in grsecurity reverts to previous
behaviour when needed and imho should be applied for the time being.

The real problem is that __d_move always switches parent dentry and
calls switch_names, which actually switches names in some cases.

Without the regression you get expected results only for short names
when you move stuff around within the same directory.

For instance with current code:
mv /foo/bar/baz /1/2/3

will replace the whole path.

Previous behavoiur would result in /foo/bar/3 as the new path, which is
clearly still incorrect

Leaving the old dentry under the same parent would mean that the "tree"
associated with now moved dentry will possibly need to be freed.

In addition to that one has to deal with the need of having renamed
dentry the new name which possibly came from an external buffer. An idea
I came up with (atomic_t refcount; char name[0]; with ->name assigned to
dentry) may require adding an additional field to struct dentry, which
would be bad.

I didn't have the time yet to look at this stuff properly.

-- 
Mateusz Guzik

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: /proc/<pid>/exe symlink behavior change in >=3.15.
  2014-09-11 23:57     ` Mateusz Guzik
@ 2014-09-15 14:14       ` Miklos Szeredi
  0 siblings, 0 replies; 5+ messages in thread
From: Miklos Szeredi @ 2014-09-15 14:14 UTC (permalink / raw)
  To: Mateusz Guzik; +Cc: Chuck Ebbert, Piotr Karbowski, Al Viro, LKML, linux-fsdevel

On Fri, Sep 12, 2014 at 1:57 AM, Mateusz Guzik <mguzik@redhat.com> wrote:
> On Thu, Sep 11, 2014 at 06:39:58PM -0500, Chuck Ebbert wrote:
>> On Sun, 7 Sep 2014 09:56:08 +0200
>> Mateusz Guzik <mguzik@redhat.com> wrote:
>>
>> > On Sat, Sep 06, 2014 at 11:44:32PM +0200, Piotr Karbowski wrote:
>> > > Hi,
>> > >
>> > > Starting with kernel 3.15 the 'exe' symlink under /proc/<pid>/ acts diffrent
>> > > than it used to in all the pre-3.15 kernels.
>> > >
>> > > The usecase:
>> > >
>> > > run /root/testbin (app that just sleeps)
>> > > cp /root/testbin /root/testbin.new
>> > > mv /root/testbin.new /root/testbin
>> > > ls -al /proc/`pidof testbin`/exe
>> > >
>> > > <=3.14: /root/testbin (deleted)
>> > > >=3.15: /root/testbin.new (deleted)
>> > >
>> > > Was the change intentional? It does render my system unusable and I failed
>> > > to find a information about such change in the ChangeLog.

Piotr, what exactly happens?  How does this break your system?

>> > >
>> >
>> > It looks like this was already broken for "long" (> DNAME_INLINE_LEN)
>> > names.
>> >
>> > Short names share the problem since da1ce0670c14d8 "vfs: add
>> > cross-rename".
>> >
>> > The following change to switch_names is the culprit:
>> >
>> > -       memcpy(dentry->d_iname, target->d_name.name,
>> > -                   target->d_name.len + 1);
>> > -       dentry->d_name.len = target->d_name.len;
>> > -       return;
>> > +       unsigned int i;
>> > +       BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
>> > +       for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
>> > +                     swap(((long *) &dentry->d_iname)[i],
>> > +                   ((long *) &target->d_iname)[i]);
>> > +       }
>> >
>> >
>> > Dentries can have names from embedded structure or from an external buffer.
>> >
>> > If you take a look around you will see the code just swaps pointers for
>> > "both external" case. But this results in the same behavoiur you are seeing.
>> >
>>
>> Looks like the real problem here is that __d_materialise_dentry() needs the
>> old behavior of switch_names() . At least that's how it got fixed in grsecurity.
>
> No.
>
> Regression in question is an effect of swap instead of memcpy in
> switch_names, as called by d_move. Fix in grsecurity reverts to previous
> behaviour when needed and imho should be applied for the time being.

Ack for that.  Linus will happily take this on the grounds of backward
compatibility, even if the old behavior was arguably more crazy than
the new one.

>
> The real problem is that __d_move always switches parent dentry and
> calls switch_names, which actually switches names in some cases.
>
> Without the regression you get expected results only for short names
> when you move stuff around within the same directory.
>
> For instance with current code:
> mv /foo/bar/baz /1/2/3
>
> will replace the whole path.
>
> Previous behavoiur would result in /foo/bar/3 as the new path, which is
> clearly still incorrect
>
> Leaving the old dentry under the same parent would mean that the "tree"
> associated with now moved dentry will possibly need to be freed.

It's done by dput().  But callers need to hold ref to old parent
anyway (because of locking) so it's not going to go away in d_move(),
only after everything is done.

>
> In addition to that one has to deal with the need of having renamed
> dentry the new name which possibly came from an external buffer. An idea
> I came up with (atomic_t refcount; char name[0]; with ->name assigned to
> dentry) may require adding an additional field to struct dentry, which
> would be bad.

You can do that without an extra field: e.g. use container_of() to get
the refcounted struct from the name.

Consider using kref for refcounting.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-09-15 14:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <540B8040.5010206@gmail.com>
2014-09-07  7:56 ` /proc/<pid>/exe symlink behavior change in >=3.15 Mateusz Guzik
2014-09-11 18:37   ` Piotr Karbowski
2014-09-11 23:39   ` Chuck Ebbert
2014-09-11 23:57     ` Mateusz Guzik
2014-09-15 14:14       ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).