public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* /proc/<pid>/exe symlink behavior change in >=3.15.
@ 2014-09-06 21:44 Piotr Karbowski
  2014-09-06 23:02 ` Richard Weinberger
  2014-09-07  7:56 ` Mateusz Guzik
  0 siblings, 2 replies; 8+ messages in thread
From: Piotr Karbowski @ 2014-09-06 21:44 UTC (permalink / raw)
  To: linux-kernel

Hi,

Starting with kernel 3.15 the 'exe' symlink under /proc/<pid>/ acts 
diffrent than it used to in all the pre-3.15 kernels.

The usecase:

run /root/testbin (app that just sleeps)
cp /root/testbin /root/testbin.new
mv /root/testbin.new /root/testbin
ls -al /proc/`pidof testbin`/exe

<=3.14: /root/testbin (deleted)
 >=3.15: /root/testbin.new (deleted)

Was the change intentional? It does render my system unusable and I 
failed to find a information about such change in the ChangeLog.

-- Piotr.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: /proc/<pid>/exe symlink behavior change in >=3.15.
  2014-09-06 21:44 /proc/<pid>/exe symlink behavior change in >=3.15 Piotr Karbowski
@ 2014-09-06 23:02 ` Richard Weinberger
  2014-09-07 10:20   ` Piotr Karbowski
  2014-09-07  7:56 ` Mateusz Guzik
  1 sibling, 1 reply; 8+ messages in thread
From: Richard Weinberger @ 2014-09-06 23:02 UTC (permalink / raw)
  To: Piotr Karbowski; +Cc: LKML

On Sat, Sep 6, 2014 at 11:44 PM, Piotr Karbowski
<piotr.karbowski@gmail.com> wrote:
> Hi,
>
> Starting with kernel 3.15 the 'exe' symlink under /proc/<pid>/ acts diffrent
> than it used to in all the pre-3.15 kernels.
>
> The usecase:
>
> run /root/testbin (app that just sleeps)
> cp /root/testbin /root/testbin.new
> mv /root/testbin.new /root/testbin
> ls -al /proc/`pidof testbin`/exe
>
> <=3.14: /root/testbin (deleted)
>>=3.15: /root/testbin.new (deleted)
>
> Was the change intentional? It does render my system unusable and I failed
> to find a information about such change in the ChangeLog.

Can you please find the offending commit?
Using an automated bisect run this should be easily doable.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: /proc/<pid>/exe symlink behavior change in >=3.15.
  2014-09-06 21:44 /proc/<pid>/exe symlink behavior change in >=3.15 Piotr Karbowski
  2014-09-06 23:02 ` Richard Weinberger
@ 2014-09-07  7:56 ` Mateusz Guzik
  2014-09-11 18:37   ` Piotr Karbowski
  2014-09-11 23:39   ` Chuck Ebbert
  1 sibling, 2 replies; 8+ messages in thread
From: Mateusz Guzik @ 2014-09-07  7:56 UTC (permalink / raw)
  To: Piotr Karbowski, Al Viro; +Cc: linux-kernel, linux-fsdevel

On Sat, Sep 06, 2014 at 11:44:32PM +0200, Piotr Karbowski wrote:
> Hi,
> 
> Starting with kernel 3.15 the 'exe' symlink under /proc/<pid>/ acts diffrent
> than it used to in all the pre-3.15 kernels.
> 
> The usecase:
> 
> run /root/testbin (app that just sleeps)
> cp /root/testbin /root/testbin.new
> mv /root/testbin.new /root/testbin
> ls -al /proc/`pidof testbin`/exe
> 
> <=3.14: /root/testbin (deleted)
> >=3.15: /root/testbin.new (deleted)
> 
> Was the change intentional? It does render my system unusable and I failed
> to find a information about such change in the ChangeLog.
> 

It looks like this was already broken for "long" (> DNAME_INLINE_LEN)
names.

Short names share the problem since da1ce0670c14d8 "vfs: add
cross-rename".

The following change to switch_names is the culprit:

-       memcpy(dentry->d_iname, target->d_name.name,
-                   target->d_name.len + 1);
-       dentry->d_name.len = target->d_name.len;
-       return;
+       unsigned int i;
+       BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
+       for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
+     		  swap(((long *) &dentry->d_iname)[i],
+  			((long *) &target->d_iname)[i]);
+       }


Dentries can have names from embedded structure or from an external buffer.

If you take a look around you will see the code just swaps pointers for
"both external" case. But this results in the same behavoiur you are seeing.

Not sure how to fix it. Name in 'target' needs to be preserved, but memory
allocation which may be needed for this purpose can fail and switch_names
returns void, just like its callers (not to mention locks held around this).

One crap idea would be to have external buffers with a reference counter.
d_inode would still be set to the buffer and freeing funcs would use
container_of to get to the counter.

I can implement that later if it sounds sane enough.

Note this behaviour seems to be a requirement for cross-rename to work.

At least restoring previous behaviour while keeping cross-rename is not hard,
I can write it later.

-- 
Mateusz Guzik

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: /proc/<pid>/exe symlink behavior change in >=3.15.
  2014-09-06 23:02 ` Richard Weinberger
@ 2014-09-07 10:20   ` Piotr Karbowski
  0 siblings, 0 replies; 8+ messages in thread
From: Piotr Karbowski @ 2014-09-07 10:20 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: LKML

Hi,

On 09/07/2014 01:02 AM, Richard Weinberger wrote:
> Can you please find the offending commit?
> Using an automated bisect run this should be easily doable.

That would be it:

da1ce0670c14d8380e423a3239e562a1dc15fa9e is the first bad commit
commit da1ce0670c14d8380e423a3239e562a1dc15fa9e
Author: Miklos Szeredi <mszeredi@suse.cz>
Date:   Tue Apr 1 17:08:43 2014 +0200

     vfs: add cross-rename

     If flags contain RENAME_EXCHANGE then exchange source and 
destination files.
     There's no restriction on the type of the files; e.g. a directory 
can be
     exchanged with a symlink.

     Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
     Reviewed-by: Jan Kara <jack@suse.cz>
     Reviewed-by: J. Bruce Fields <bfields@redhat.com>

:040000 040000 86e9be1a42cb91c1068b76700c74ec7fdba5443c 
e0269a1fffefe60cbd6d56ccab6485ff383d728d M	fs
:040000 040000 e677e31ebb0a5b355cefe199cba9453800624905 
7a083642c2ea0698d85bfc9e65b99e8cfb4a7440 M	include
:040000 040000 266252c531debd56ae52e000ed1579dca19ddb35 
bfcb75158276de3feafd45d726f3297dfc139389 M	security

-- Piotr.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: /proc/<pid>/exe symlink behavior change in >=3.15.
  2014-09-07  7:56 ` Mateusz Guzik
@ 2014-09-11 18:37   ` Piotr Karbowski
  2014-09-11 23:39   ` Chuck Ebbert
  1 sibling, 0 replies; 8+ messages in thread
From: Piotr Karbowski @ 2014-09-11 18:37 UTC (permalink / raw)
  To: Mateusz Guzik, Al Viro; +Cc: linux-kernel, linux-fsdevel

Hi Mateusz,

 > I can implement that later if it sounds sane enough.
> Note this behaviour seems to be a requirement for cross-rename to work.
>
> At least restoring previous behaviour while keeping cross-rename is not hard,
> I can write it later.

Have you find a proper solution for this very issue or anything that can 
push forward addressing it?

Thankfully this issue has been addressed by a workaround in fs/dcache.c 
in grsecurity patch for 3.16.2 
(grsecurity-3.0-3.16.2-201409082129.patch) but it would be great to have 
a fix upstream.

-- Piotr.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: /proc/<pid>/exe symlink behavior change in >=3.15.
  2014-09-07  7:56 ` Mateusz Guzik
  2014-09-11 18:37   ` Piotr Karbowski
@ 2014-09-11 23:39   ` Chuck Ebbert
  2014-09-11 23:57     ` Mateusz Guzik
  1 sibling, 1 reply; 8+ messages in thread
From: Chuck Ebbert @ 2014-09-11 23:39 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Piotr Karbowski, Al Viro, linux-kernel, linux-fsdevel,
	Miklos Szeredi

On Sun, 7 Sep 2014 09:56:08 +0200
Mateusz Guzik <mguzik@redhat.com> wrote:

> On Sat, Sep 06, 2014 at 11:44:32PM +0200, Piotr Karbowski wrote:
> > Hi,
> > 
> > Starting with kernel 3.15 the 'exe' symlink under /proc/<pid>/ acts diffrent
> > than it used to in all the pre-3.15 kernels.
> > 
> > The usecase:
> > 
> > run /root/testbin (app that just sleeps)
> > cp /root/testbin /root/testbin.new
> > mv /root/testbin.new /root/testbin
> > ls -al /proc/`pidof testbin`/exe
> > 
> > <=3.14: /root/testbin (deleted)
> > >=3.15: /root/testbin.new (deleted)
> > 
> > Was the change intentional? It does render my system unusable and I failed
> > to find a information about such change in the ChangeLog.
> > 
> 
> It looks like this was already broken for "long" (> DNAME_INLINE_LEN)
> names.
> 
> Short names share the problem since da1ce0670c14d8 "vfs: add
> cross-rename".
> 
> The following change to switch_names is the culprit:
> 
> -       memcpy(dentry->d_iname, target->d_name.name,
> -                   target->d_name.len + 1);
> -       dentry->d_name.len = target->d_name.len;
> -       return;
> +       unsigned int i;
> +       BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
> +       for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
> +     		  swap(((long *) &dentry->d_iname)[i],
> +  			((long *) &target->d_iname)[i]);
> +       }
> 
> 
> Dentries can have names from embedded structure or from an external buffer.
> 
> If you take a look around you will see the code just swaps pointers for
> "both external" case. But this results in the same behavoiur you are seeing.
> 

Looks like the real problem here is that __d_materialise_dentry() needs the
old behavior of switch_names() . At least that's how it got fixed in grsecurity.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: /proc/<pid>/exe symlink behavior change in >=3.15.
  2014-09-11 23:39   ` Chuck Ebbert
@ 2014-09-11 23:57     ` Mateusz Guzik
  2014-09-15 14:14       ` Miklos Szeredi
  0 siblings, 1 reply; 8+ messages in thread
From: Mateusz Guzik @ 2014-09-11 23:57 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Piotr Karbowski, Al Viro, linux-kernel, linux-fsdevel,
	Miklos Szeredi

On Thu, Sep 11, 2014 at 06:39:58PM -0500, Chuck Ebbert wrote:
> On Sun, 7 Sep 2014 09:56:08 +0200
> Mateusz Guzik <mguzik@redhat.com> wrote:
> 
> > On Sat, Sep 06, 2014 at 11:44:32PM +0200, Piotr Karbowski wrote:
> > > Hi,
> > > 
> > > Starting with kernel 3.15 the 'exe' symlink under /proc/<pid>/ acts diffrent
> > > than it used to in all the pre-3.15 kernels.
> > > 
> > > The usecase:
> > > 
> > > run /root/testbin (app that just sleeps)
> > > cp /root/testbin /root/testbin.new
> > > mv /root/testbin.new /root/testbin
> > > ls -al /proc/`pidof testbin`/exe
> > > 
> > > <=3.14: /root/testbin (deleted)
> > > >=3.15: /root/testbin.new (deleted)
> > > 
> > > Was the change intentional? It does render my system unusable and I failed
> > > to find a information about such change in the ChangeLog.
> > > 
> > 
> > It looks like this was already broken for "long" (> DNAME_INLINE_LEN)
> > names.
> > 
> > Short names share the problem since da1ce0670c14d8 "vfs: add
> > cross-rename".
> > 
> > The following change to switch_names is the culprit:
> > 
> > -       memcpy(dentry->d_iname, target->d_name.name,
> > -                   target->d_name.len + 1);
> > -       dentry->d_name.len = target->d_name.len;
> > -       return;
> > +       unsigned int i;
> > +       BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
> > +       for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
> > +     		  swap(((long *) &dentry->d_iname)[i],
> > +  			((long *) &target->d_iname)[i]);
> > +       }
> > 
> > 
> > Dentries can have names from embedded structure or from an external buffer.
> > 
> > If you take a look around you will see the code just swaps pointers for
> > "both external" case. But this results in the same behavoiur you are seeing.
> > 
> 
> Looks like the real problem here is that __d_materialise_dentry() needs the
> old behavior of switch_names() . At least that's how it got fixed in grsecurity.

No.

Regression in question is an effect of swap instead of memcpy in
switch_names, as called by d_move. Fix in grsecurity reverts to previous
behaviour when needed and imho should be applied for the time being.

The real problem is that __d_move always switches parent dentry and
calls switch_names, which actually switches names in some cases.

Without the regression you get expected results only for short names
when you move stuff around within the same directory.

For instance with current code:
mv /foo/bar/baz /1/2/3

will replace the whole path.

Previous behavoiur would result in /foo/bar/3 as the new path, which is
clearly still incorrect

Leaving the old dentry under the same parent would mean that the "tree"
associated with now moved dentry will possibly need to be freed.

In addition to that one has to deal with the need of having renamed
dentry the new name which possibly came from an external buffer. An idea
I came up with (atomic_t refcount; char name[0]; with ->name assigned to
dentry) may require adding an additional field to struct dentry, which
would be bad.

I didn't have the time yet to look at this stuff properly.

-- 
Mateusz Guzik

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: /proc/<pid>/exe symlink behavior change in >=3.15.
  2014-09-11 23:57     ` Mateusz Guzik
@ 2014-09-15 14:14       ` Miklos Szeredi
  0 siblings, 0 replies; 8+ messages in thread
From: Miklos Szeredi @ 2014-09-15 14:14 UTC (permalink / raw)
  To: Mateusz Guzik; +Cc: Chuck Ebbert, Piotr Karbowski, Al Viro, LKML, linux-fsdevel

On Fri, Sep 12, 2014 at 1:57 AM, Mateusz Guzik <mguzik@redhat.com> wrote:
> On Thu, Sep 11, 2014 at 06:39:58PM -0500, Chuck Ebbert wrote:
>> On Sun, 7 Sep 2014 09:56:08 +0200
>> Mateusz Guzik <mguzik@redhat.com> wrote:
>>
>> > On Sat, Sep 06, 2014 at 11:44:32PM +0200, Piotr Karbowski wrote:
>> > > Hi,
>> > >
>> > > Starting with kernel 3.15 the 'exe' symlink under /proc/<pid>/ acts diffrent
>> > > than it used to in all the pre-3.15 kernels.
>> > >
>> > > The usecase:
>> > >
>> > > run /root/testbin (app that just sleeps)
>> > > cp /root/testbin /root/testbin.new
>> > > mv /root/testbin.new /root/testbin
>> > > ls -al /proc/`pidof testbin`/exe
>> > >
>> > > <=3.14: /root/testbin (deleted)
>> > > >=3.15: /root/testbin.new (deleted)
>> > >
>> > > Was the change intentional? It does render my system unusable and I failed
>> > > to find a information about such change in the ChangeLog.

Piotr, what exactly happens?  How does this break your system?

>> > >
>> >
>> > It looks like this was already broken for "long" (> DNAME_INLINE_LEN)
>> > names.
>> >
>> > Short names share the problem since da1ce0670c14d8 "vfs: add
>> > cross-rename".
>> >
>> > The following change to switch_names is the culprit:
>> >
>> > -       memcpy(dentry->d_iname, target->d_name.name,
>> > -                   target->d_name.len + 1);
>> > -       dentry->d_name.len = target->d_name.len;
>> > -       return;
>> > +       unsigned int i;
>> > +       BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
>> > +       for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
>> > +                     swap(((long *) &dentry->d_iname)[i],
>> > +                   ((long *) &target->d_iname)[i]);
>> > +       }
>> >
>> >
>> > Dentries can have names from embedded structure or from an external buffer.
>> >
>> > If you take a look around you will see the code just swaps pointers for
>> > "both external" case. But this results in the same behavoiur you are seeing.
>> >
>>
>> Looks like the real problem here is that __d_materialise_dentry() needs the
>> old behavior of switch_names() . At least that's how it got fixed in grsecurity.
>
> No.
>
> Regression in question is an effect of swap instead of memcpy in
> switch_names, as called by d_move. Fix in grsecurity reverts to previous
> behaviour when needed and imho should be applied for the time being.

Ack for that.  Linus will happily take this on the grounds of backward
compatibility, even if the old behavior was arguably more crazy than
the new one.

>
> The real problem is that __d_move always switches parent dentry and
> calls switch_names, which actually switches names in some cases.
>
> Without the regression you get expected results only for short names
> when you move stuff around within the same directory.
>
> For instance with current code:
> mv /foo/bar/baz /1/2/3
>
> will replace the whole path.
>
> Previous behavoiur would result in /foo/bar/3 as the new path, which is
> clearly still incorrect
>
> Leaving the old dentry under the same parent would mean that the "tree"
> associated with now moved dentry will possibly need to be freed.

It's done by dput().  But callers need to hold ref to old parent
anyway (because of locking) so it's not going to go away in d_move(),
only after everything is done.

>
> In addition to that one has to deal with the need of having renamed
> dentry the new name which possibly came from an external buffer. An idea
> I came up with (atomic_t refcount; char name[0]; with ->name assigned to
> dentry) may require adding an additional field to struct dentry, which
> would be bad.

You can do that without an extra field: e.g. use container_of() to get
the refcounted struct from the name.

Consider using kref for refcounting.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-09-15 14:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-06 21:44 /proc/<pid>/exe symlink behavior change in >=3.15 Piotr Karbowski
2014-09-06 23:02 ` Richard Weinberger
2014-09-07 10:20   ` Piotr Karbowski
2014-09-07  7:56 ` Mateusz Guzik
2014-09-11 18:37   ` Piotr Karbowski
2014-09-11 23:39   ` Chuck Ebbert
2014-09-11 23:57     ` Mateusz Guzik
2014-09-15 14:14       ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox