public inbox for llvm@lists.linux.dev
 help / color / mirror / Atom feed
* CFI support for type punning?
@ 2023-11-18  0:12 Kent Overstreet
  2023-11-18  5:07 ` Kees Cook
  0 siblings, 1 reply; 3+ messages in thread
From: Kent Overstreet @ 2023-11-18  0:12 UTC (permalink / raw)
  To: llvm; +Cc: Kees Cook, Sami Tolvanen, Nathan Chancellor, Nick Desaulniers,
	Tom Rix

I received the following report

[  228.366640] bcachefs (loop0): mounting version 1.3: rebalance_work
[  228.366652] bcachefs (loop0): initializing new filesystem
[  228.366910] bcachefs (loop0): going read-write
[  228.367619] CFI failure at process_scheduled_works+0x273/0x4a0 (target: btree_update_set_nodes_written+0x0/0x50; expected type: 0xb86001b6 
[  228.367645] WARNING: CPU: 3 PID: 11 at process_scheduled_works (kernel/workqueue.c:2630 kernel/workqueue.c:2703) 
[  228.367653] CPU: 3 PID: 11 Comm: kworker/u48:0 Tainted: G                T  6.7.0-rc1+ #1
[  228.367660] Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS Dasharo (coreboot+UEFI) v1.1.1 02/22/2023
[  228.367663] Workqueue: btree_update btree_update_set_nodes_written
[  228.367671] RIP: 0010:process_scheduled_works (kernel/workqueue.c:2630 kernel/workqueue.c:2703) 
[ 228.367679] Code: 24 fc 00 49 ff 87 98 00 00 00 4c 89 ef e8 e5 3e 77 01 0f 1f 44 00 00 4c 8b 5b 18 4c 89 f7 41 ba 4a fe 9f 47 45 03 53 f1 74 02 <0f> 0b 41 ff d3 0f 1f 00 48 8b 53 18 0f 1f 44 00 00 49 ff 87 a0 00
All code
========

Code starting with the faulting instruction
===========================================
[  228.367683] RSP: 0018:ffffc90000093e58 EFLAGS: 00010213
[  228.367688] RAX: 0000000000000000 RBX: ffff88810018a900 RCX: ffff88815082c605
[  228.367692] RDX: ffff888100216428 RSI: 0000000000000000 RDI: ffff88810d41e000
[  228.367694] RBP: ffff888140bddcb0 R08: 0000000000000000 R09: 0000000000000000
[  228.367697] R10: 000000003c094208 R11: ffffffff8814ea20 R12: ffff88810d41e008
[  228.367699] R13: ffff888100216400 R14: ffff88810d41e000 R15: ffff88815082c600
[  228.367702] FS:  0000000000000000(0000) GS:ffff88888f380000(0000) knlGS:0000000000000000
[  228.367706] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  228.367709] CR2: 00007f884730dff0 CR3: 000000014088c000 CR4: 0000000000b50ef0
[  228.367713] Call Trace:
[  228.367716]  <TASK>
[  228.367719] ? __warn (kernel/panic.c:236 kernel/panic.c:677) 
[  228.367726] ? process_scheduled_works (kernel/workqueue.c:2630 kernel/workqueue.c:2703) 
[  228.367733] ? report_cfi_failure (kernel/cfi.c:22) 
[  228.367740] ? handle_cfi_failure (arch/x86/kernel/cfi.c:80) 
[  228.367746] ? __cfi_btree_update_set_nodes_written (fs/bcachefs/btree_update_interior.c:786) 
[  228.367753] ? handle_bug (arch/x86/kernel/traps.c:238) 
[  228.367760] ? exc_invalid_op (arch/x86/kernel/traps.c:258) 
[  228.367766] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568) 
[  228.367772] ? __cfi_btree_update_set_nodes_written (fs/bcachefs/btree_update_interior.c:786) 
[  228.367778] ? process_scheduled_works (kernel/workqueue.c:2630 kernel/workqueue.c:2703) 
[  228.367784] ? process_scheduled_works (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/workqueue.h:82 kernel/workqueue.c:2629 kernel/workqueue.c:2703) 
[  228.367790] worker_thread (./include/linux/list.h:373 kernel/workqueue.c:841 kernel/workqueue.c:2785) 
[  228.367796] ? __cfi_worker_thread (kernel/workqueue.c:2730) 
[  228.367802] kthread (kernel/kthread.c:388) 
[  228.367809] ? __cfi_kthread (kernel/kthread.c:341) 
[  228.367816] ret_from_fork (arch/x86/kernel/process.c:147) 
[  228.367820] ? __cfi_kthread (kernel/kthread.c:341) 
[  228.367826] ret_from_fork_asm (arch/x86/entry/entry_64.S:250) 
[  228.367831]  </TASK>
[  228.367833] ---[ end trace 0000000000000000 ]---
[  228.368179] bcachefs (loop0): initializing freespace
[  228.369347] CFI failure at process_scheduled_works+0x273/0x4a0 (target: bch2_journal_write+0x0/0x1040; expected type: 0xb86001b6 
[  228.369370] WARNING: CPU: 19 PID: 496 at process_scheduled_works (kernel/workqueue.c:2630 kernel/workqueue.c:2703) 
[  228.369380] CPU: 19 PID: 496 Comm: kworker/19:1H Tainted: P                T  6.7.0-rc1+ #1
[  228.369388] Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS Dasharo (coreboot+UEFI) v1.1.1 02/22/2023
[  228.369391] Workqueue: bcachefs_io bch2_journal_write
[  228.369400] RIP: 0010:process_scheduled_works (kernel/workqueue.c:2630 kernel/workqueue.c:2703) 
[ 228.369408] Code: 24 fc 00 49 ff 87 98 00 00 00 4c 89 ef e8 e5 3e 77 01 0f 1f 44 00 00 4c 8b 5b 18 4c 89 f7 41 ba 4a fe 9f 47 45 03 53 f1 74 02 <0f> 0b 41 ff d3 0f 1f 00 48 8b 53 18 0f 1f 44 00 00 49 ff 87 a0 00
All code

which indicates that bcache and bcachefs are both broken with CFI
enabled; closures (lib/closure.c, include/linux/closure.h) do type
punning with passing a void (closure_fn)(struct closure *) to the
workqueue code, which calls it as a void (workqueue_fn)(struct
work_struct *). That is, we're considering the closure to be the same as
the embedded/unioned work_struct, as far as the workqueue code is
considered.

Is there any way to tell the CFI code about this kind of type punning,
or would it be possible to add support for this?

The alternatives I'm looking at aren't terribly palatable; we'd either
have to increase the size of struct closure with another function
pointer (and another indirect function call on every use); or we'd have
to patch the workqueue code and add a flag (Tejun won't like that), or
I'll have to convert every closure function to take a work_struct arg -
with a resulting decrease in clarity of types.

Cheers,
Kent

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: CFI support for type punning?
  2023-11-18  0:12 CFI support for type punning? Kent Overstreet
@ 2023-11-18  5:07 ` Kees Cook
  2023-11-18 20:27   ` Kent Overstreet
  0 siblings, 1 reply; 3+ messages in thread
From: Kees Cook @ 2023-11-18  5:07 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: llvm, Sami Tolvanen, Nathan Chancellor, Nick Desaulniers, Tom Rix

On Fri, Nov 17, 2023 at 07:12:08PM -0500, Kent Overstreet wrote:
> [  228.367619] CFI failure at process_scheduled_works+0x273/0x4a0 (target: btree_update_set_nodes_written+0x0/0x50; expected type: 0xb86001b6 
> [  228.367645] WARNING: CPU: 3 PID: 11 at process_scheduled_works (kernel/workqueue.c:2630 kernel/workqueue.c:2703) 
> [...]

Hunh. This is the first report I can find for bcache being tested under
CFI! Dang, perhaps 0day or syzkaller need to grow some bcache (and now
bcachefs) exercises.

> which indicates that bcache and bcachefs are both broken with CFI
> enabled; closures (lib/closure.c, include/linux/closure.h) do type
> punning with passing a void (closure_fn)(struct closure *) to the
> workqueue code, which calls it as a void (workqueue_fn)(struct
> work_struct *). That is, we're considering the closure to be the same as
> the embedded/unioned work_struct, as far as the workqueue code is
> considered.
> 
> Is there any way to tell the CFI code about this kind of type punning,
> or would it be possible to add support for this?

There is no type punning support, though a neighboring idea has been
proposed to define separate groups of the same prototypes (e.g. so one
work_struct user can't call another work_struct user's callbacks). But
that's all future work.

> The alternatives I'm looking at aren't terribly palatable; we'd either
> have to increase the size of struct closure with another function
> pointer (and another indirect function call on every use); or we'd have
> to patch the workqueue code and add a flag (Tejun won't like that), or
> I'll have to convert every closure function to take a work_struct arg -
> with a resulting decrease in clarity of types.

I think the reason the existing compiler warnings we have in place for
this didn't warn was because there was no explicit casting; it was all
hiding in an intentionally overlapped union. :)

struct closure {
        union {
                struct {
                        struct workqueue_struct *wq;
                        struct closure_syncer   *s;
                        struct llist_node       list;
                        closure_fn              *fn;
                };
                struct work_struct      work;
        };

...
        BUILD_BUG_ON(offsetof(struct closure, fn)
                     != offsetof(struct work_struct, func));

It seems the assignment used "fn", and the WORK_INIT used its internal
func for the call. No casting needed. :)

The good news is that closure.h _is_ using the common code pattern for
these kinds of things (container_of() to find the wrapping struct),
but it's using "struct closure" instead of "struct work_struct", which
is what you concluded as well.

Part of the work we did over the last few years now was fixing cases
like this (where the wrapping function was being passed instead of
the work_struct), and it just needed a few localized refactorings
that Coccinelle seemed to handle well. I don't think it should be too
disruptive here either.

static void journal_write_done(struct closure *cl)
{
        struct journal *j = container_of(cl, struct journal, io);

Would, I think, become:

static void journal_write_done(struct work_struct *ws)
{
	struct closure *cl = container_of(ws, struct work_struct, work);
        struct journal *j = container_of(cl, struct journal, io);

Usually these kinds of changes resulted in no binary differences (as
expected). I do agree, though, the ergonomics are a little weird to
expose the internals of closure.h. I've seen other APIs paper over this
in their own special ways. For example, maybe something like this:

#define CLOSURE_CALLBACK(name)		\
	static void name(struct work_struct *ws)

#define closure_type(type, member, name)	\
	type *name = container_of(ws, struct work_struct, member.work)

CLOSURE_CALLBACK(journal_write_done)
{
	closure_type(struct journal, io, j);
	...

The less common code pattern used for fixing CFI issues, as you also
considered, was a wrapper function. (Which would need space in the struct
for the separate callback, as you figured.)

Another example, though not a work_struct refactor, was what we did
in NFS last year (which used Coccinelle):
https://lore.kernel.org/lkml/20221202204854.gonna.644-kees@kernel.org/

And another, which was quite painful, was the struct timer_list work in
2018, which touched something like 1000 files. :|
https://outflux.net/blog/archives/2018/02/05/security-things-in-linux-v4-15/#v4.15-timer_list

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: CFI support for type punning?
  2023-11-18  5:07 ` Kees Cook
@ 2023-11-18 20:27   ` Kent Overstreet
  0 siblings, 0 replies; 3+ messages in thread
From: Kent Overstreet @ 2023-11-18 20:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: llvm, Sami Tolvanen, Nathan Chancellor, Nick Desaulniers, Tom Rix

On Fri, Nov 17, 2023 at 09:07:06PM -0800, Kees Cook wrote:
> There is no type punning support, though a neighboring idea has been
> proposed to define separate groups of the same prototypes (e.g. so one
> work_struct user can't call another work_struct user's callbacks). But
> that's all future work.

There was a proposal for making container_of() typesafe awhile back (I
forget who came up with it - Matthew, maybe?). Related?

> 
> > The alternatives I'm looking at aren't terribly palatable; we'd either
> > have to increase the size of struct closure with another function
> > pointer (and another indirect function call on every use); or we'd have
> > to patch the workqueue code and add a flag (Tejun won't like that), or
> > I'll have to convert every closure function to take a work_struct arg -
> > with a resulting decrease in clarity of types.
> 
> I think the reason the existing compiler warnings we have in place for
> this didn't warn was because there was no explicit casting; it was all
> hiding in an intentionally overlapped union. :)

;)

> struct closure {
>         union {
>                 struct {
>                         struct workqueue_struct *wq;
>                         struct closure_syncer   *s;
>                         struct llist_node       list;
>                         closure_fn              *fn;
>                 };
>                 struct work_struct      work;
>         };
> 
> ...
>         BUILD_BUG_ON(offsetof(struct closure, fn)
>                      != offsetof(struct work_struct, func));
> 
> It seems the assignment used "fn", and the WORK_INIT used its internal
> func for the call. No casting needed. :)
> 
> The good news is that closure.h _is_ using the common code pattern for
> these kinds of things (container_of() to find the wrapping struct),
> but it's using "struct closure" instead of "struct work_struct", which
> is what you concluded as well.
> 
> Part of the work we did over the last few years now was fixing cases
> like this (where the wrapping function was being passed instead of
> the work_struct), and it just needed a few localized refactorings
> that Coccinelle seemed to handle well. I don't think it should be too
> disruptive here either.
> 
> static void journal_write_done(struct closure *cl)
> {
>         struct journal *j = container_of(cl, struct journal, io);
> 
> Would, I think, become:
> 
> static void journal_write_done(struct work_struct *ws)
> {
> 	struct closure *cl = container_of(ws, struct work_struct, work);
>         struct journal *j = container_of(cl, struct journal, io);

The thing I don't like about this change is that closure fns are
significantly semantically different from a normal workqueue function -
a closure fn runs with a refcount held on the closure that it's
responsible for releasing either via continue_at() or closure_return().

> Usually these kinds of changes resulted in no binary differences (as
> expected). I do agree, though, the ergonomics are a little weird to
> expose the internals of closure.h. I've seen other APIs paper over this
> in their own special ways. For example, maybe something like this:
> 
> #define CLOSURE_CALLBACK(name)		\
> 	static void name(struct work_struct *ws)
> 
> #define closure_type(type, member, name)	\
> 	type *name = container_of(ws, struct work_struct, member.work)
> 
> CLOSURE_CALLBACK(journal_write_done)
> {
> 	closure_type(struct journal, io, j);

Yeah, I think this works.

> And another, which was quite painful, was the struct timer_list work in
> 2018, which touched something like 1000 files. :|
> https://outflux.net/blog/archives/2018/02/05/security-things-in-linux-v4-15/#v4.15-timer_list

Thanks for the links :)

Related to the timer list change - back when I was active in the block
layer I was contemplating getting rid of bi_private, in favour of
container_of(). I wonder if that's something we could start nudging
people to do, less pointer chasing is always good :)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-11-18 20:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-18  0:12 CFI support for type punning? Kent Overstreet
2023-11-18  5:07 ` Kees Cook
2023-11-18 20:27   ` Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox