All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Josef Bacik <josef@toxicpanda.com>, Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, nbd@other.debian.org,
	linux-security-module@vger.kernel.org, selinux@vger.kernel.org
Subject: Re: [PATCH] nbd: override creds to kernel when calling sock_{send,recv}msg()
Date: Fri, 10 Oct 2025 09:56:24 +0800	[thread overview]
Message-ID: <aOhnyMtw0a0fqaNO@fedora> (raw)
In-Reply-To: <20251009134542.1529148-1-omosnace@redhat.com>

On Thu, Oct 09, 2025 at 03:45:42PM +0200, Ondrej Mosnacek wrote:
> sock_{send,recv}msg() internally calls security_socket_{send,recv}msg(),
> which does security checks (e.g. SELinux) for socket access against the
> current task. However, _sock_xmit() in drivers/block/nbd.c may be called
> indirectly from a userspace syscall, where the NBD socket access would
> be incorrectly checked against the calling userspace task (which simply
> tries to read/write a file that happens to reside on an NBD device).
> 
> To fix this, temporarily override creds to kernel ones before calling
> the sock_*() functions. This allows the security modules to recognize
> this as internal access by the kernel, which will normally be allowed.
> 
> A way to trigger the issue is to do the following (on a system with
> SELinux set to enforcing):
> 
>     ### Create nbd device:
>     truncate -s 256M /tmp/testfile
>     nbd-server localhost:10809 /tmp/testfile
> 
>     ### Connect to the nbd server:
>     nbd-client localhost
> 
>     ### Create mdraid array
>     mdadm --create -l 1 -n 2 /dev/md/testarray /dev/nbd0 missing

-EACCESS is triggered when reading data from mdadm process:

@security[mdadm, -13,
        handshake_exit+221615650
        handshake_exit+221615650
        handshake_exit+221616465
        security_socket_sendmsg+5
        sock_sendmsg+106
        handshake_exit+221616150
        sock_sendmsg+5
        __sock_xmit+162
        nbd_send_cmd+597
        nbd_handle_cmd+377
        nbd_queue_rq+63
        blk_mq_dispatch_rq_list+653
        __blk_mq_do_dispatch_sched+184
        __blk_mq_sched_dispatch_requests+333
        blk_mq_sched_dispatch_requests+38
        blk_mq_run_hw_queue+239
        blk_mq_dispatch_plug_list+382
        blk_mq_flush_plug_list.part.0+55
        __blk_flush_plug+241
        __submit_bio+353
        submit_bio_noacct_nocheck+364
        submit_bio_wait+84
        __blkdev_direct_IO_simple+232
        blkdev_read_iter+162
        vfs_read+591
        ksys_read+95
        do_syscall_64+92
        entry_SYSCALL_64_after_hwframe+120
]: 1

The issue is started to expose since f1daaaf0c1fa ("block: add plug while submitting IO").

> 
>     ### Stop the array
>     mdadm --stop /dev/md/testarray
> 
>     ### Disconnect the nbd device
>     nbd-client -d /dev/nbd0
> 
>     ### Reconnect to nbd devices:
>     nbd-client localhost

The above steps don't matter actually.

> 
> After these steps, assuming the SELinux policy doesn't allow the
> unexpected access pattern, errors will be visible on the kernel console:
> 
> [   93.997980] nbd2: detected capacity change from 0 to 524288
> [  100.314271] md/raid1:md126: active with 1 out of 2 mirrors
> [  100.314301] md126: detected capacity change from 0 to 522240
> [  100.317288] block nbd2: Send control failed (result -13)           <-----
> [  100.317306] block nbd2: Request send failed, requeueing            <-----
> [  100.318765] block nbd2: Receive control failed (result -32)        <-----
> [  100.318783] block nbd2: Dead connection, failed to find a fallback
> [  100.318794] block nbd2: shutting down sockets
> [  100.318802] I/O error, dev nbd2, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [  100.318817] Buffer I/O error on dev md126, logical block 0, async page read
> [  100.322000] I/O error, dev nbd2, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [  100.322016] Buffer I/O error on dev md126, logical block 0, async page read
> [  100.323244] I/O error, dev nbd2, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [  100.323253] Buffer I/O error on dev md126, logical block 0, async page read
> [  100.324436] I/O error, dev nbd2, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [  100.324444] Buffer I/O error on dev md126, logical block 0, async page read
> [  100.325621] I/O error, dev nbd2, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [  100.325630] Buffer I/O error on dev md126, logical block 0, async page read
> [  100.326813] I/O error, dev nbd2, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [  100.326822] Buffer I/O error on dev md126, logical block 0, async page read
> [  100.326834]  md126: unable to read partition table
> [  100.329872] I/O error, dev nbd2, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [  100.329889] Buffer I/O error on dev nbd2, logical block 0, async page read
> [  100.331186] I/O error, dev nbd2, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [  100.331195] Buffer I/O error on dev nbd2, logical block 0, async page read
> [  100.332371] I/O error, dev nbd2, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [  100.332379] Buffer I/O error on dev nbd2, logical block 0, async page read
> [  100.333550] I/O error, dev nbd2, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [  100.333559] Buffer I/O error on dev nbd2, logical block 0, async page read
> [  100.334721]  nbd2: unable to read partition table
> [  100.350993]  nbd2: unable to read partition table
> 
> The corresponding SELinux denial on Fedora/RHEL will look like this
> (assuming it's not silenced):
> type=AVC msg=audit(1758104872.510:116): avc:  denied  { write } for  pid=1908 comm="mdadm" laddr=::1 lport=32772 faddr=::1 fport=10809 scontext=system_u:system_r:mdadm_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=tcp_socket permissive=0
> 
> Cc: Ming Lei <ming.lei@redhat.com>
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2348878
> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> ---
>  drivers/block/nbd.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 6463d0e8d0cef..d50055c974a6b 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -52,6 +52,7 @@
>  static DEFINE_IDR(nbd_index_idr);
>  static DEFINE_MUTEX(nbd_index_mutex);
>  static struct workqueue_struct *nbd_del_wq;
> +static struct cred *nbd_cred;
>  static int nbd_total_devices = 0;
>  
>  struct nbd_sock {
> @@ -554,6 +555,7 @@ static int __sock_xmit(struct nbd_device *nbd, struct socket *sock, int send,
>  	int result;
>  	struct msghdr msg = {} ;
>  	unsigned int noreclaim_flag;
> +	const struct cred *old_cred;
>  
>  	if (unlikely(!sock)) {
>  		dev_err_ratelimited(disk_to_dev(nbd->disk),
> @@ -562,6 +564,8 @@ static int __sock_xmit(struct nbd_device *nbd, struct socket *sock, int send,
>  		return -EINVAL;
>  	}
>  
> +	old_cred = override_creds(nbd_cred);
> +
>  	msg.msg_iter = *iter;
>  
>  	noreclaim_flag = memalloc_noreclaim_save();
> @@ -586,6 +590,8 @@ static int __sock_xmit(struct nbd_device *nbd, struct socket *sock, int send,
>  
>  	memalloc_noreclaim_restore(noreclaim_flag);
>  
> +	revert_creds(old_cred);
> +
>  	return result;
>  }
>  
> @@ -2669,7 +2675,15 @@ static int __init nbd_init(void)
>  		return -ENOMEM;
>  	}
>  
> +	nbd_cred = prepare_kernel_cred(&init_task);
> +	if (!nbd_cred) {
> +		destroy_workqueue(nbd_del_wq);
> +		unregister_blkdev(NBD_MAJOR, "nbd");
> +		return -ENOMEM;
> +	}
> +
>  	if (genl_register_family(&nbd_genl_family)) {
> +		put_cred(nbd_cred);
>  		destroy_workqueue(nbd_del_wq);
>  		unregister_blkdev(NBD_MAJOR, "nbd");
>  		return -EINVAL;
> @@ -2706,6 +2720,8 @@ static void __exit nbd_cleanup(void)
>  
>  	nbd_dbg_close();
>  
> +	put_cred(nbd_cred);
> +
>  	mutex_lock(&nbd_index_mutex);
>  	idr_for_each(&nbd_index_idr, &nbd_exit_cb, &del_list);
>  	mutex_unlock(&nbd_index_mutex);

Yeah, as commented by Stephen and Paul, put_cred() need to be moved after
destroy_workqueue(nbd_del_wq) in which wq function nbd disk is removed and
recv wq is destroyed.

Otherwise, this patch looks fine from block layer viewpoint, and I verified
that it does fix the -EACCESS failure for madadm to read from nbd.

Thanks,
Ming


      parent reply	other threads:[~2025-10-10  1:56 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-09 13:45 [PATCH] nbd: override creds to kernel when calling sock_{send,recv}msg() Ondrej Mosnacek
2025-10-09 14:34 ` Stephen Smalley
2025-10-09 14:59   ` Paul Moore
2025-10-10  1:56 ` Ming Lei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aOhnyMtw0a0fqaNO@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=nbd@other.debian.org \
    --cc=omosnace@redhat.com \
    --cc=selinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.