linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* out-of-bounds write in the function ata_pio_sector
@ 2025-01-02  2:17 reveliofuzzing
  2025-01-02 10:40 ` Niklas Cassel
  2025-01-22 14:59 ` Niklas Cassel
  0 siblings, 2 replies; 10+ messages in thread
From: reveliofuzzing @ 2025-01-02  2:17 UTC (permalink / raw)
  To: damien.lemoal, linux-ide

Hi there,

We found an out-of-bounds write in the function ata_pio_sector, which can cause
the kernel to crash. We would like to report it for your reference.

## Problem in ata_pio_sector
ata_pio_sector uses the following code to decide which page to use for the I/O:
page = sg_page(qc->cursg);
offset = qc->cursg->offset + qc->cursg_ofs;

/* get the current page and offset */
page = nth_page(page, (offset >> PAGE_SHIFT));
offset %= PAGE_SIZE;
but we found that `offset` could be as high as 0x5000---qc->cursg_ofs==0x5000,
qc->cursg->offset == 0x0, making `page` point to a higher-position page that
belongs to other threads.

## Example crash
This out-of-bound write can cause the kernel to crash at arbitrary places,
depending on when the corrupted page is accessed by the other thread.

We found this problem can happen in Linux kernel 6.1~6.12. Here is one crash in
Linux kernel 6.1:
executing program
[   15.461899] program syz-executor is using a deprecated SCSI ioctl,
please convert it to SG_IO
[   79.990338] ata1: lost interrupt (Status 0x58)
[   80.510447] ata1: found unknown device (class 0)
[   80.519176] BUG: kernel NULL pointer dereference, address: 0000000000000030
[   80.527336] #PF: supervisor read access in kernel mode
[   80.533045] #PF: error_code(0x0000) - not-present page
[   80.538614] PGD 0 P4D 0
[   80.543339] Oops: 0000 [#1] PREEMPT SMP PTI
[   80.547027] CPU: 0 PID: 195 Comm: syz-executor Not tainted
6.1.0-rc7-g29106f2b0871 #774
[   80.551328] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   80.555779] RIP: 0010:do_exit+0x34a/0xa40
[   80.557528] Code: 08 4c 89 64 24 10 e8 15 ec 7e 00 48 8b 93 78 05
00 00 48 8d 83 78 05 00 00 48 39 c2 0f 85 7e 05 00 00 48 89 df e8 26
7e 01 00 <48> 8b 68 30 49 89 c5 48 39 e0
[   80.562666] RSP: 0018:ffffc900004cbd80 EFLAGS: 00010046
[   80.564524] RAX: 0000000000000000 RBX: ffff888003b30f80 RCX: 0000000000000000
[   80.566399] RDX: 0000000000000000 RSI: ffff888003ec84a8 RDI: ffff888003b30f80
[   80.568231] RBP: ffff888003d003c0 R08: ffff888003b31870 R09: ffffc900004cbcb0
[   80.570042] R10: ffff888003b08090 R11: fffffffffffcc2c7 R12: ffffc900004cbd88
[   80.571629] R13: ffff888003d00420 R14: ffff888003b30f01 R15: ffff888003b31768
[   80.573237] FS:  0000000000000000(0000) GS:ffff88807dc00000(0000)
knlGS:0000000000000000
[   80.574916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   80.576221] CR2: 0000000000000030 CR3: 000000000220a000 CR4: 0000000000350eb0
[   80.577652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   80.579075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   80.580435] Call Trace:
[   80.581092]  <TASK>
[   80.581704]  do_group_exit+0x28/0x80
[   80.582514]  get_signal+0x8de/0x910
[   80.583309]  ? sg_ioctl+0x331/0xb20
[   80.584112]  arch_do_signal_or_restart+0x1b/0x660
[   80.585093]  ? __x64_sys_ioctl+0x178/0x9b0
[   80.585973]  ? handle_mm_fault+0x6a/0x1b0
[   80.586817]  exit_to_user_mode_prepare+0x89/0x150
[   80.587766]  syscall_exit_to_user_mode+0x1d/0x40
[   80.588739]  do_syscall_64+0x50/0x90
[   80.589572]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[   80.590552] RIP: 0033:0x7f5e0333aaad
[   80.591358] Code: Unable to access opcode bytes at 0x7f5e0333aa83.
[   80.592529] RSP: 002b:00007ffc19613458 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[   80.593999] RAX: 0000000000000002 RBX: 00007ffc19613480 RCX: 00007f5e0333aaad
[   80.595319] RDX: 0000000020000040 RSI: 0000000000000001 RDI: 0000000000000003
[   80.596569] RBP: 0000000000000000 R08: 0000000000000012 R09: 0000000000000000
[   80.597824] R10: 00007f5e0338803c R11: 0000000000000246 R12: 00007ffc19613490
[   80.599134] R13: 00007f5e033b2df0 R14: 0000000000000000 R15: 0000000000000000
[   80.600467]  </TASK>
[   80.601090] Modules linked in:
[   80.602009] CR2: 0000000000000030
[   80.602904] ---[ end trace 0000000000000000 ]---
[   80.603834] RIP: 0010:do_exit+0x34a/0xa40
[   80.604703] Code: 08 4c 89 64 24 10 e8 15 ec 7e 00 48 8b 93 78 05
00 00 48 8d 83 78 05 00 00 48 39 c2 0f 85 7e 05 00 00 48 89 df e8 26
7e 01 00 <48> 8b 68 30 49 89 c5 48 39 e0
[   80.607601] RSP: 0018:ffffc900004cbd80 EFLAGS: 00010046
[   80.608740] RAX: 0000000000000000 RBX: ffff888003b30f80 RCX: 0000000000000000
[   80.610073] RDX: 0000000000000000 RSI: ffff888003ec84a8 RDI: ffff888003b30f80
[   80.611394] RBP: ffff888003d003c0 R08: ffff888003b31870 R09: ffffc900004cbcb0
[   80.612717] R10: ffff888003b08090 R11: fffffffffffcc2c7 R12: ffffc900004cbd88
[   80.614027] R13: ffff888003d00420 R14: ffff888003b30f01 R15: ffff888003b31768
[   80.615335] FS:  0000000000000000(0000) GS:ffff88807dc00000(0000)
knlGS:0000000000000000
[   80.616721] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   80.617845] CR2: 0000000000000030 CR3: 000000000220a000 CR4: 0000000000350eb0
[   80.619171] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   80.620385] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   80.621706] note: syz-executor[195] exited with preempt_count 1
[   80.622850] Fixing recursive fault but reboot is needed!
[   80.623910] BUG: scheduling while atomic: syz-executor/195/0x00000000
[   80.625128] Modules linked in:
[   80.626096] Preemption disabled at:
[   80.626202] [<0000000000000000>] 0x0
[   80.627687] CPU: 0 PID: 195 Comm: syz-executor Tainted: G      D
        6.1.0-rc7-g29106f2b0871 #774
[   80.629278] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   80.631147] Call Trace:
[   80.631834]  <TASK>
[   80.632459]  dump_stack_lvl+0x33/0x46
[   80.633285]  __schedule_bug.cold+0x7d/0x8e
[   80.634181]  __schedule+0x63b/0x700
[   80.634986]  ? _printk+0x43/0x49
[   80.635775]  do_task_dead+0x3f/0x50
[   80.636575]  make_task_dead.cold+0x51/0xab
[   80.637446]  rewind_stack_and_make_dead+0x17/0x20
[   80.638434] RIP: 0033:0x7f5e0333aaad
[   80.639306] Code: Unable to access opcode bytes at 0x7f5e0333aa83.
[   80.640466] RSP: 002b:00007ffc19613458 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[   80.641901] RAX: 0000000000000002 RBX: 00007ffc19613480 RCX: 00007f5e0333aaad
[   80.643213] RDX: 0000000020000040 RSI: 0000000000000001 RDI: 0000000000000003
[   80.644474] RBP: 0000000000000000 R08: 0000000000000012 R09: 0000000000000000
[   80.645789] R10: 00007f5e0338803c R11: 0000000000000246 R12: 00007ffc19613490
[   80.647105] R13: 00007f5e033b2df0 R14: 0000000000000000 R15: 0000000000000000
[   80.648441]  </TASK>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: out-of-bounds write in the function ata_pio_sector
  2025-01-02  2:17 out-of-bounds write in the function ata_pio_sector reveliofuzzing
@ 2025-01-02 10:40 ` Niklas Cassel
  2025-01-02 16:23   ` reveliofuzzing
  2025-01-22 14:59 ` Niklas Cassel
  1 sibling, 1 reply; 10+ messages in thread
From: Niklas Cassel @ 2025-01-02 10:40 UTC (permalink / raw)
  To: reveliofuzzing; +Cc: damien.lemoal, linux-ide

Hello reveliofuzzing,

On Wed, Jan 01, 2025 at 09:17:02PM -0500, reveliofuzzing wrote:
> Hi there,
> 
> We found an out-of-bounds write in the function ata_pio_sector, which can cause
> the kernel to crash. We would like to report it for your reference.
> 
> ## Problem in ata_pio_sector
> ata_pio_sector uses the following code to decide which page to use for the I/O:
> page = sg_page(qc->cursg);
> offset = qc->cursg->offset + qc->cursg_ofs;
> 
> /* get the current page and offset */
> page = nth_page(page, (offset >> PAGE_SHIFT));
> offset %= PAGE_SIZE;
> but we found that `offset` could be as high as 0x5000---qc->cursg_ofs==0x5000,
> qc->cursg->offset == 0x0, making `page` point to a higher-position page that
> belongs to other threads.
> 
> ## Example crash
> This out-of-bound write can cause the kernel to crash at arbitrary places,
> depending on when the corrupted page is accessed by the other thread.
> 
> We found this problem can happen in Linux kernel 6.1~6.12. Here is one crash in
> Linux kernel 6.1:

Thank you for reporting!

I assume that you haven't tested kernels earlier than 6.1?

(Looking at the driver, there was no major change between 6.0 and 6.1,
so this bug has probably been there for a long time.)


Could you please share your reproducer and your kernel config as well?


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: out-of-bounds write in the function ata_pio_sector
  2025-01-02 10:40 ` Niklas Cassel
@ 2025-01-02 16:23   ` reveliofuzzing
  2025-01-17 14:26     ` Niklas Cassel
  0 siblings, 1 reply; 10+ messages in thread
From: reveliofuzzing @ 2025-01-02 16:23 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: damien.lemoal, linux-ide

On Thu, Jan 2, 2025 at 5:40 AM Niklas Cassel <cassel@kernel.org> wrote:
>
> Hello reveliofuzzing,
>
> On Wed, Jan 01, 2025 at 09:17:02PM -0500, reveliofuzzing wrote:
> > Hi there,
> >
> > We found an out-of-bounds write in the function ata_pio_sector, which can cause
> > the kernel to crash. We would like to report it for your reference.
> >
> > ## Problem in ata_pio_sector
> > ata_pio_sector uses the following code to decide which page to use for the I/O:
> > page = sg_page(qc->cursg);
> > offset = qc->cursg->offset + qc->cursg_ofs;
> >
> > /* get the current page and offset */
> > page = nth_page(page, (offset >> PAGE_SHIFT));
> > offset %= PAGE_SIZE;
> > but we found that `offset` could be as high as 0x5000---qc->cursg_ofs==0x5000,
> > qc->cursg->offset == 0x0, making `page` point to a higher-position page that
> > belongs to other threads.
> >
> > ## Example crash
> > This out-of-bound write can cause the kernel to crash at arbitrary places,
> > depending on when the corrupted page is accessed by the other thread.
> >
> > We found this problem can happen in Linux kernel 6.1~6.12. Here is one crash in
> > Linux kernel 6.1:
>
> Thank you for reporting!
>
> I assume that you haven't tested kernels earlier than 6.1?
Unfortunately, we haven't tested older kernels.

>
> (Looking at the driver, there was no major change between 6.0 and 6.1,
> so this bug has probably been there for a long time.)
>
>
> Could you please share your reproducer and your kernel config as well?

Below we report our setup for linux kernel 6.12:

- General steps to reproduce the bug
1. Launch the VM
2. Copy the reproducer (compiled binary) into the VM
3. Run it with the root user
4. Wait for the bug to happen (generally takes less than 3 minutes)

- QEMU command (QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.30))
qemu-system-x86_64 -m 2G -smp 2 -kernel /linux-6.12/bzImage \
    -append "console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \
    -drive file=./bullseye.img,format=raw \
    -net user,host=10.0.2.10,hostfwd=tcp:127.0.0.1:10021-:22 \
    -net nic,model=e1000 \
    -enable-kvm \
    -nographic \
    -pidfile vm.pid \
    2>&1 | tee vm.log

- VM image
It is created using Syzkaller's script:
https://github.com/google/syzkaller/blob/master/tools/create-image.sh

- bzImage
        - kernel: 6.12
        - GCC: Ubuntu 9.4.0-1ubuntu1~20.04.2
        - config:
https://drive.google.com/file/d/1ZfeXgVadChVJtIGx5zMhBqHnmlomP3Hf/view?usp=sharing
        - compiled bzimage:
https://drive.google.com/file/d/1MJf0WQ9_eztvuBcaBwCGC-rb7VBQtuac/view?usp=sharing

- reproducer
        - compiled binary:
https://drive.google.com/file/d/1Q9prtQKi5LVrOwrFJ162eXzTwTnDUq5X/view?usp=sharing
        - C program:
// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <dirent.h>
#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <sched.h>
#include <setjmp.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mount.h>
#include <sys/prctl.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#include <linux/capability.h>

static unsigned long long procid;

static __thread int clone_ongoing;
static __thread int skip_segv;
static __thread jmp_buf segv_env;

static void segv_handler(int sig, siginfo_t* info, void* ctx)
{
        if (__atomic_load_n(&clone_ongoing, __ATOMIC_RELAXED) != 0) {
                exit(sig);
        }
        uintptr_t addr = (uintptr_t)info->si_addr;
        const uintptr_t prog_start = 1 << 20;
        const uintptr_t prog_end = 100 << 20;
        int skip = __atomic_load_n(&skip_segv, __ATOMIC_RELAXED) != 0;
        int valid = addr < prog_start || addr > prog_end;
        if (skip && valid) {
                _longjmp(segv_env, 1);
        }
        exit(sig);
}

static void install_segv_handler(void)
{
        struct sigaction sa;
        memset(&sa, 0, sizeof(sa));
        sa.sa_handler = SIG_IGN;
        syscall(SYS_rt_sigaction, 0x20, &sa, NULL, 8);
        syscall(SYS_rt_sigaction, 0x21, &sa, NULL, 8);
        memset(&sa, 0, sizeof(sa));
        sa.sa_sigaction = segv_handler;
        sa.sa_flags = SA_NODEFER | SA_SIGINFO;
        sigaction(SIGSEGV, &sa, NULL);
        sigaction(SIGBUS, &sa, NULL);
}

#define NONFAILING(...) ({ int ok = 1; __atomic_fetch_add(&skip_segv,
1, __ATOMIC_SEQ_CST); if (_setjmp(segv_env) == 0) { __VA_ARGS__; }
else ok = 0; __atomic_fetch_sub(&skip_segv, 1, __ATOMIC_SEQ_CST); ok;
})

static void sleep_ms(uint64_t ms)
{
        usleep(ms * 1000);
}

static uint64_t current_time_ms(void)
{
        struct timespec ts;
        if (clock_gettime(CLOCK_MONOTONIC, &ts))
        exit(1);
        return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
}

static bool write_file(const char* file, const char* what, ...)
{
        char buf[1024];
        va_list args;
        va_start(args, what);
        vsnprintf(buf, sizeof(buf), what, args);
        va_end(args);
        buf[sizeof(buf) - 1] = 0;
        int len = strlen(buf);
        int fd = open(file, O_WRONLY | O_CLOEXEC);
        if (fd == -1)
                return false;
        if (write(fd, buf, len) != len) {
                int err = errno;
                close(fd);
                errno = err;
                return false;
        }
        close(fd);
        return true;
}

static long syz_open_dev(volatile long a0, volatile long a1, volatile long a2)
{
        if (a0 == 0xc || a0 == 0xb) {
                char buf[128];
                sprintf(buf, "/dev/%s/%d:%d", a0 == 0xc ? "char" :
"block", (uint8_t)a1, (uint8_t)a2);
                return open(buf, O_RDWR, 0);
        } else {
                char buf[1024];
                char* hash;
                strncpy(buf, (char*)a0, sizeof(buf) - 1);
                buf[sizeof(buf) - 1] = 0;
                while ((hash = strchr(buf, '#'))) {
                        *hash = '0' + (char)(a1 % 10);
                        a1 /= 10;
                }
                return open(buf, a2, 0);
        }
}

static void setup_binderfs();
static void setup_fusectl();
static void sandbox_common_mount_tmpfs(void)
{
        write_file("/proc/sys/fs/mount-max", "100000");
        if (mkdir("./syz-tmp", 0777))
        exit(1);
        if (mount("", "./syz-tmp", "tmpfs", 0, NULL))
        exit(1);
        if (mkdir("./syz-tmp/newroot", 0777))
        exit(1);
        if (mkdir("./syz-tmp/newroot/dev", 0700))
        exit(1);
        unsigned bind_mount_flags = MS_BIND | MS_REC | MS_PRIVATE;
        if (mount("/dev", "./syz-tmp/newroot/dev", NULL,
bind_mount_flags, NULL))
        exit(1);
        if (mkdir("./syz-tmp/newroot/proc", 0700))
        exit(1);
        if (mount("syz-proc", "./syz-tmp/newroot/proc", "proc", 0, NULL))
        exit(1);
        if (mkdir("./syz-tmp/newroot/selinux", 0700))
        exit(1);
        const char* selinux_path = "./syz-tmp/newroot/selinux";
        if (mount("/selinux", selinux_path, NULL, bind_mount_flags, NULL)) {
                if (errno != ENOENT)
        exit(1);
                if (mount("/sys/fs/selinux", selinux_path, NULL,
bind_mount_flags, NULL) && errno != ENOENT)
        exit(1);
        }
        if (mkdir("./syz-tmp/newroot/sys", 0700))
        exit(1);
        if (mount("/sys", "./syz-tmp/newroot/sys", 0, bind_mount_flags, NULL))
        exit(1);
        if (mount("/sys/kernel/debug",
"./syz-tmp/newroot/sys/kernel/debug", NULL, bind_mount_flags, NULL) &&
errno != ENOENT)
        exit(1);
        if (mount("/sys/fs/smackfs",
"./syz-tmp/newroot/sys/fs/smackfs", NULL, bind_mount_flags, NULL) &&
errno != ENOENT)
        exit(1);
        if (mount("/proc/sys/fs/binfmt_misc",
"./syz-tmp/newroot/proc/sys/fs/binfmt_misc", NULL, bind_mount_flags,
NULL) && errno != ENOENT)
        exit(1);
        if (mkdir("./syz-tmp/pivot", 0777))
        exit(1);
        if (syscall(SYS_pivot_root, "./syz-tmp", "./syz-tmp/pivot")) {
                if (chdir("./syz-tmp"))
        exit(1);
        } else {
                if (chdir("/"))
        exit(1);
                if (umount2("./pivot", MNT_DETACH))
        exit(1);
        }
        if (chroot("./newroot"))
        exit(1);
        if (chdir("/"))
        exit(1);
        setup_binderfs();
        setup_fusectl();
}

static void setup_fusectl()
{
        if (mount(0, "/sys/fs/fuse/connections", "fusectl", 0, 0)) {
        }
}

static void setup_binderfs()
{
        if (mkdir("/dev/binderfs", 0777)) {
        }
        if (mount("binder", "/dev/binderfs", "binder", 0, NULL)) {
        }
        if (symlink("/dev/binderfs", "./binderfs")) {
        }
}

static void loop();

static void sandbox_common()
{
        prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
        if (getppid() == 1)
        exit(1);
        struct rlimit rlim;
        rlim.rlim_cur = rlim.rlim_max = (200 << 20);
        setrlimit(RLIMIT_AS, &rlim);
        rlim.rlim_cur = rlim.rlim_max = 32 << 20;
        setrlimit(RLIMIT_MEMLOCK, &rlim);
        rlim.rlim_cur = rlim.rlim_max = 136 << 20;
        setrlimit(RLIMIT_FSIZE, &rlim);
        rlim.rlim_cur = rlim.rlim_max = 1 << 20;
        setrlimit(RLIMIT_STACK, &rlim);
        rlim.rlim_cur = rlim.rlim_max = 128 << 20;
        setrlimit(RLIMIT_CORE, &rlim);
        rlim.rlim_cur = rlim.rlim_max = 256;
        setrlimit(RLIMIT_NOFILE, &rlim);
        if (unshare(CLONE_NEWNS)) {
        }
        if (mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL)) {
        }
        if (unshare(CLONE_NEWIPC)) {
        }
        if (unshare(0x02000000)) {
        }
        if (unshare(CLONE_NEWUTS)) {
        }
        if (unshare(CLONE_SYSVSEM)) {
        }
        typedef struct {
                const char* name;
                const char* value;
        } sysctl_t;
        static const sysctl_t sysctls[] = {
            {"/proc/sys/kernel/shmmax", "16777216"},
            {"/proc/sys/kernel/shmall", "536870912"},
            {"/proc/sys/kernel/shmmni", "1024"},
            {"/proc/sys/kernel/msgmax", "8192"},
            {"/proc/sys/kernel/msgmni", "1024"},
            {"/proc/sys/kernel/msgmnb", "1024"},
            {"/proc/sys/kernel/sem", "1024 1048576 500 1024"},
        };
        unsigned i;
        for (i = 0; i < sizeof(sysctls) / sizeof(sysctls[0]); i++)
                write_file(sysctls[i].name, sysctls[i].value);
}

static int wait_for_loop(int pid)
{
        if (pid < 0)
        exit(1);
        int status = 0;
        while (waitpid(-1, &status, __WALL) != pid) {
        }
        return WEXITSTATUS(status);
}

static void drop_caps(void)
{
        struct __user_cap_header_struct cap_hdr = {};
        struct __user_cap_data_struct cap_data[2] = {};
        cap_hdr.version = _LINUX_CAPABILITY_VERSION_3;
        cap_hdr.pid = getpid();
        if (syscall(SYS_capget, &cap_hdr, &cap_data))
        exit(1);
        const int drop = (1 << CAP_SYS_PTRACE) | (1 << CAP_SYS_NICE);
        cap_data[0].effective &= ~drop;
        cap_data[0].permitted &= ~drop;
        cap_data[0].inheritable &= ~drop;
        if (syscall(SYS_capset, &cap_hdr, &cap_data))
        exit(1);
}

static int do_sandbox_none(void)
{
        if (unshare(CLONE_NEWPID)) {
        }
        int pid = fork();
        if (pid != 0)
                return wait_for_loop(pid);
        sandbox_common();
        drop_caps();
        if (unshare(CLONE_NEWNET)) {
        }
        write_file("/proc/sys/net/ipv4/ping_group_range", "0 65535");
        sandbox_common_mount_tmpfs();
        loop();
        exit(1);
}

static void kill_and_wait(int pid, int* status)
{
        kill(-pid, SIGKILL);
        kill(pid, SIGKILL);
        for (int i = 0; i < 100; i++) {
                if (waitpid(-1, status, WNOHANG | __WALL) == pid)
                        return;
                usleep(1000);
        }
        DIR* dir = opendir("/sys/fs/fuse/connections");
        if (dir) {
                for (;;) {
                        struct dirent* ent = readdir(dir);
                        if (!ent)
                                break;
                        if (strcmp(ent->d_name, ".") == 0 ||
strcmp(ent->d_name, "..") == 0)
                                continue;
                        char abort[300];
                        snprintf(abort, sizeof(abort),
"/sys/fs/fuse/connections/%s/abort", ent->d_name);
                        int fd = open(abort, O_WRONLY);
                        if (fd == -1) {
                                continue;
                        }
                        if (write(fd, abort, 1) < 0) {
                        }
                        close(fd);
                }
                closedir(dir);
        } else {
        }
        while (waitpid(-1, status, __WALL) != pid) {
        }
}

static void setup_test()
{
        prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
        setpgrp();
        write_file("/proc/self/oom_score_adj", "1000");
}

static void execute_one(void);

#define WAIT_FLAGS __WALL

static void loop(void)
{
        int iter = 0;
        for (;; iter++) {
                int pid = fork();
                if (pid < 0)
        exit(1);
                if (pid == 0) {
                        setup_test();
                        execute_one();
                        exit(0);
                }
                int status = 0;
                uint64_t start = current_time_ms();
                for (;;) {
                        sleep_ms(10);
                        if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
                                break;
                        if (current_time_ms() - start < 5000)
                                continue;
                        kill_and_wait(pid, &status);
                        break;
                }
        }
}

uint64_t r[1] = {0xffffffffffffffff};

void execute_one(void)
{
        intptr_t res = 0;
        if (write(1, "executing program\n", sizeof("executing
program\n") - 1)) {}
        NONFAILING(memcpy((void*)0x20000000, "/dev/sg#\000", 9));
        res = -1;
        NONFAILING(res = syz_open_dev(/*dev=*/0x20000000, /*id=*/0,
/*flags=*/0));
        if (res != -1)
                r[0] = res;
        NONFAILING(memcpy((void*)0x20000040,
"\x00\x00\x00\x00\x42\x0d\x00\x00\x85\x0a\xaa", 11));
        NONFAILING(sprintf((char*)0x2000004b, "0x%016llx", (long long)r[0]));
        syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/1, /*arg=*/0x20000040ul);

}
int main(void)
{
        syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul,
/*prot=*/0ul, /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/0x32ul,
/*fd=*/-1, /*offset=*/0ul);
        syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
/*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/7ul,
/*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/0x32ul, /*fd=*/-1,
/*offset=*/0ul);
        syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul,
/*prot=*/0ul, /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/0x32ul,
/*fd=*/-1, /*offset=*/0ul);
        const char* reason;
        (void)reason;
        install_segv_handler();
        for (procid = 0; procid < 4; procid++) {
               if (fork() == 0) {
                        do_sandbox_none();
                }
        }
        sleep(1000000);
        return 0;
}

>
>
> Kind regards,
> Niklas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: out-of-bounds write in the function ata_pio_sector
  2025-01-02 16:23   ` reveliofuzzing
@ 2025-01-17 14:26     ` Niklas Cassel
  2025-01-17 16:42       ` reveliofuzzing
  0 siblings, 1 reply; 10+ messages in thread
From: Niklas Cassel @ 2025-01-17 14:26 UTC (permalink / raw)
  To: reveliofuzzing; +Cc: damien.lemoal, linux-ide

Hello reveliofuzzing,

On Thu, Jan 02, 2025 at 11:23:49AM -0500, reveliofuzzing wrote:
> On Thu, Jan 2, 2025 at 5:40 AM Niklas Cassel <cassel@kernel.org> wrote:
> > On Wed, Jan 01, 2025 at 09:17:02PM -0500, reveliofuzzing wrote:
> > > Hi there,
> > >
> > > We found an out-of-bounds write in the function ata_pio_sector, which can cause
> > > the kernel to crash. We would like to report it for your reference.
> > >
> > > ## Problem in ata_pio_sector
> > > ata_pio_sector uses the following code to decide which page to use for the I/O:
> > > page = sg_page(qc->cursg);
> > > offset = qc->cursg->offset + qc->cursg_ofs;
> > >
> > > /* get the current page and offset */
> > > page = nth_page(page, (offset >> PAGE_SHIFT));
> > > offset %= PAGE_SIZE;
> > > but we found that `offset` could be as high as 0x5000---qc->cursg_ofs==0x5000,
> > > qc->cursg->offset == 0x0, making `page` point to a higher-position page that
> > > belongs to other threads.
> > >
> > > ## Example crash
> > > This out-of-bound write can cause the kernel to crash at arbitrary places,
> > > depending on when the corrupted page is accessed by the other thread.
> > >
> > > We found this problem can happen in Linux kernel 6.1~6.12. Here is one crash in
> > > Linux kernel 6.1:
> >
> > Thank you for reporting!
> >
> > I assume that you haven't tested kernels earlier than 6.1?
> Unfortunately, we haven't tested older kernels.
> 
> >
> > (Looking at the driver, there was no major change between 6.0 and 6.1,
> > so this bug has probably been there for a long time.)
> >
> >
> > Could you please share your reproducer and your kernel config as well?
> 
> Below we report our setup for linux kernel 6.12:
> 
> - General steps to reproduce the bug
> 1. Launch the VM
> 2. Copy the reproducer (compiled binary) into the VM
> 3. Run it with the root user
> 4. Wait for the bug to happen (generally takes less than 3 minutes)

I managed to reproduce the bug using your bzImage and syz-executor binary.

However, the .config you provided does not match the bzImage.
E.g. the e1000/e1000e driver is not built-in in your .config,
so I get no networking, while it is enabled in your bzImage.
This makes me worried that you have other changes in your .config.
If you still have the exact config for this bzImage, could you please add
it as an attachment?

I've been using the syz-executor binary that you attached, since the C code
pasted below does not compile, it seems like it has some unintended newlines.
Perhaps you could add it as an attachment instead?

Also, you only talk about 6.12 kernel. Out of curiosity, have you managed to
reproduce this bug on v6.13-rc kernels? Have you tried?


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: out-of-bounds write in the function ata_pio_sector
  2025-01-17 14:26     ` Niklas Cassel
@ 2025-01-17 16:42       ` reveliofuzzing
  2025-01-20 13:54         ` Niklas Cassel
  0 siblings, 1 reply; 10+ messages in thread
From: reveliofuzzing @ 2025-01-17 16:42 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: damien.lemoal, linux-ide

On Fri, Jan 17, 2025 at 9:26 AM Niklas Cassel <cassel@kernel.org> wrote:
>
> Hello reveliofuzzing,
>
> On Thu, Jan 02, 2025 at 11:23:49AM -0500, reveliofuzzing wrote:
> > On Thu, Jan 2, 2025 at 5:40 AM Niklas Cassel <cassel@kernel.org> wrote:
> > > On Wed, Jan 01, 2025 at 09:17:02PM -0500, reveliofuzzing wrote:
> > > > Hi there,
> > > >
> > > > We found an out-of-bounds write in the function ata_pio_sector, which can cause
> > > > the kernel to crash. We would like to report it for your reference.
> > > >
> > > > ## Problem in ata_pio_sector
> > > > ata_pio_sector uses the following code to decide which page to use for the I/O:
> > > > page = sg_page(qc->cursg);
> > > > offset = qc->cursg->offset + qc->cursg_ofs;
> > > >
> > > > /* get the current page and offset */
> > > > page = nth_page(page, (offset >> PAGE_SHIFT));
> > > > offset %= PAGE_SIZE;
> > > > but we found that `offset` could be as high as 0x5000---qc->cursg_ofs==0x5000,
> > > > qc->cursg->offset == 0x0, making `page` point to a higher-position page that
> > > > belongs to other threads.
> > > >
> > > > ## Example crash
> > > > This out-of-bound write can cause the kernel to crash at arbitrary places,
> > > > depending on when the corrupted page is accessed by the other thread.
> > > >
> > > > We found this problem can happen in Linux kernel 6.1~6.12. Here is one crash in
> > > > Linux kernel 6.1:
> > >
> > > Thank you for reporting!
> > >
> > > I assume that you haven't tested kernels earlier than 6.1?
> > Unfortunately, we haven't tested older kernels.
> >
> > >
> > > (Looking at the driver, there was no major change between 6.0 and 6.1,
> > > so this bug has probably been there for a long time.)
> > >
> > >
> > > Could you please share your reproducer and your kernel config as well?
> >
> > Below we report our setup for linux kernel 6.12:
> >
> > - General steps to reproduce the bug
> > 1. Launch the VM
> > 2. Copy the reproducer (compiled binary) into the VM
> > 3. Run it with the root user
> > 4. Wait for the bug to happen (generally takes less than 3 minutes)
>
> I managed to reproduce the bug using your bzImage and syz-executor binary.
>
> However, the .config you provided does not match the bzImage.
> E.g. the e1000/e1000e driver is not built-in in your .config,
> so I get no networking, while it is enabled in your bzImage.
> This makes me worried that you have other changes in your .config.
> If you still have the exact config for this bzImage, could you please add
> it as an attachment?
Hi, we double-checked it but found the config shared above is the one we used.
CONFIG_E1000XXX is enabled in this config.

>
> I've been using the syz-executor binary that you attached, since the C code
> pasted below does not compile, it seems like it has some unintended newlines.
> Perhaps you could add it as an attachment instead?
Here is the C program:
https://drive.google.com/file/d/1Uvhqrn-ntEYQT2PBiQjp0xaor-32WYHO/view?usp=sharing
Please let us know if you still can't compile it. We can take a look
at how Syzkaller
generates this C program and compiles it into the syz-executor binary.

>
> Also, you only talk about 6.12 kernel. Out of curiosity, have you managed to
> reproduce this bug on v6.13-rc kernels? Have you tried?
We haven't tried it yet, but we can do that in the next few days. Will keep you
posted.

>
>
> Kind regards,
> Niklas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: out-of-bounds write in the function ata_pio_sector
  2025-01-17 16:42       ` reveliofuzzing
@ 2025-01-20 13:54         ` Niklas Cassel
  2025-01-20 16:47           ` reveliofuzzing
  0 siblings, 1 reply; 10+ messages in thread
From: Niklas Cassel @ 2025-01-20 13:54 UTC (permalink / raw)
  To: reveliofuzzing; +Cc: Damien Le Moal, linux-ide

On Fri, Jan 17, 2025 at 11:42:45AM -0500, reveliofuzzing wrote:
> >
> > However, the .config you provided does not match the bzImage.
> > E.g. the e1000/e1000e driver is not built-in in your .config,
> > so I get no networking, while it is enabled in your bzImage.
> > This makes me worried that you have other changes in your .config.
> > If you still have the exact config for this bzImage, could you please add
> > it as an attachment?
> Hi, we double-checked it but found the config shared above is the one we used.
> CONFIG_E1000XXX is enabled in this config.

You are right.

For some reason it got compiled as a module when I did "make olddefconfig",
with your config as base. Sorry about the confusion!


> 
> >
> > I've been using the syz-executor binary that you attached, since the C code
> > pasted below does not compile, it seems like it has some unintended newlines.
> > Perhaps you could add it as an attachment instead?
> Here is the C program:
> https://drive.google.com/file/d/1Uvhqrn-ntEYQT2PBiQjp0xaor-32WYHO/view?usp=sharing
> Please let us know if you still can't compile it. We can take a look
> at how Syzkaller
> generates this C program and compiles it into the syz-executor binary.

Still does not compile for me.

It still appears to have some uninteded newlines.

You probably copy pasted it from an editor instead of uploading it/sending
it directly.


One example is:
line380:		if (write(1, "executing program\n", sizeof("executing
line381:	program\n") - 1)) {}

Strings in C are not allowed to span multiple lines without a backslash
immediately before the newline, or by using string concatenation.


> 
> >
> > Also, you only talk about 6.12 kernel. Out of curiosity, have you managed to
> > reproduce this bug on v6.13-rc kernels? Have you tried?
> We haven't tried it yet, but we can do that in the next few days. Will keep you
> posted.

I got an off-list email that mentioned that you could reproduce on 6.13-rc7,
thank you!

Hopefully I will have some time to try to debug this sometime this week.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: out-of-bounds write in the function ata_pio_sector
  2025-01-20 13:54         ` Niklas Cassel
@ 2025-01-20 16:47           ` reveliofuzzing
  0 siblings, 0 replies; 10+ messages in thread
From: reveliofuzzing @ 2025-01-20 16:47 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: Damien Le Moal, linux-ide

On Mon, Jan 20, 2025 at 8:54 AM Niklas Cassel <cassel@kernel.org> wrote:
>
> On Fri, Jan 17, 2025 at 11:42:45AM -0500, reveliofuzzing wrote:
> > >
> > > However, the .config you provided does not match the bzImage.
> > > E.g. the e1000/e1000e driver is not built-in in your .config,
> > > so I get no networking, while it is enabled in your bzImage.
> > > This makes me worried that you have other changes in your .config.
> > > If you still have the exact config for this bzImage, could you please add
> > > it as an attachment?
> > Hi, we double-checked it but found the config shared above is the one we used.
> > CONFIG_E1000XXX is enabled in this config.
>
> You are right.
>
> For some reason it got compiled as a module when I did "make olddefconfig",
> with your config as base. Sorry about the confusion!
>
>
> >
> > >
> > > I've been using the syz-executor binary that you attached, since the C code
> > > pasted below does not compile, it seems like it has some unintended newlines.
> > > Perhaps you could add it as an attachment instead?
> > Here is the C program:
> > https://drive.google.com/file/d/1Uvhqrn-ntEYQT2PBiQjp0xaor-32WYHO/view?usp=sharing
> > Please let us know if you still can't compile it. We can take a look
> > at how Syzkaller
> > generates this C program and compiles it into the syz-executor binary.
>
> Still does not compile for me.
>
> It still appears to have some uninteded newlines.
>
> You probably copy pasted it from an editor instead of uploading it/sending
> it directly.
>
>
> One example is:
> line380:                if (write(1, "executing program\n", sizeof("executing
> line381:        program\n") - 1)) {}
>
> Strings in C are not allowed to span multiple lines without a backslash
> immediately before the newline, or by using string concatenation.
Hi, we have updated the c program here:
https://drive.google.com/file/d/1Uvhqrn-ntEYQT2PBiQjp0xaor-32WYHO/view?usp=sharing

This was the command used for compiling it:
gcc -o /tmp/syz-executor -DGOOS_linux=1 -DGOARCH_amd64=1
-DHOSTGOOS_linux=1 -x c - -m64 -O2 -pthread -Wall -Werror -Wparentheses
-Wunused-const-variable -Wframe-larger-than=16384 -Wno-stringop-overflow
-Wno-array-bounds -Wno-format-overflow -Wno-unused-but-set-variable
-Wno-unused-command-line-argument -static-pie < ./reproducer.c

>
>
> >
> > >
> > > Also, you only talk about 6.12 kernel. Out of curiosity, have you managed to
> > > reproduce this bug on v6.13-rc kernels? Have you tried?
> > We haven't tried it yet, but we can do that in the next few days. Will keep you
> > posted.
>
> I got an off-list email that mentioned that you could reproduce on 6.13-rc7,
> thank you!
>
> Hopefully I will have some time to try to debug this sometime this week.
>
>
> Kind regards,
> Niklas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: out-of-bounds write in the function ata_pio_sector
  2025-01-02  2:17 out-of-bounds write in the function ata_pio_sector reveliofuzzing
  2025-01-02 10:40 ` Niklas Cassel
@ 2025-01-22 14:59 ` Niklas Cassel
  2025-01-29  3:09   ` Martin K. Petersen
  1 sibling, 1 reply; 10+ messages in thread
From: Niklas Cassel @ 2025-01-22 14:59 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, Martin K. Petersen
  Cc: Damien Le Moal, linux-ide, reveliofuzzing

Hello Jens, Christoph, Martin,

On Wed, Jan 01, 2025 at 09:17:02PM -0500, reveliofuzzing wrote:
> Hi there,
>
> We found an out-of-bounds write in the function ata_pio_sector, which can cause
> the kernel to crash. We would like to report it for your reference.
>
> ## Problem in ata_pio_sector
> ata_pio_sector uses the following code to decide which page to use for the I/O:
> page = sg_page(qc->cursg);
> offset = qc->cursg->offset + qc->cursg_ofs;
>
> /* get the current page and offset */
> page = nth_page(page, (offset >> PAGE_SHIFT));
> offset %= PAGE_SIZE;
> but we found that `offset` could be as high as 0x5000---qc->cursg_ofs==0x5000,
> qc->cursg->offset == 0x0, making `page` point to a higher-position page that
> belongs to other threads.

I've managed to reproduce this on v6.13 using reveliofuzzing syzkaller
reproducer (which issues SCSI_IOCTL_SEND_COMMAND to the /dev/sg devices).

The problem is that the final if-statement in function ata_pio_sector()
(which lives in drivers/ata/libata-sff.c) looks like this:

	if (qc->cursg_ofs == qc->cursg->length) {
		qc->cursg = sg_next(qc->cursg);
		if (!qc->cursg)
			ap->hsm_task_state = HSM_ST_LAST;
		qc->cursg_ofs = 0;
	}

When the memory corruption occurs, it is because this if-statement never
evaluated to true, so ata_pio_sector() just continued writing to more and
more pages, overwriting pages owned by other tasks.


Here is some trace_printks that I added that shows the problem:

     kworker/1:3-116     [001] d.h1.   121.445073: ata_pio_sector: 1/3 qc: ffff888008928130 cursg_ofs: 0x0 cursg->offset: 0x0 offset: 0x0 offset_mod: 0x0 sect_size: 0x200
     kworker/1:3-116     [001] d.h1.   121.445079: ata_pio_sector: 2/3 orig_page: ffffea0000263180 page: ffffea0000263180 page_diff: 0x0 nbytes: 0xd42 curbytes: 0x0
     kworker/1:3-116     [001] d.h1.   121.445098: ata_pio_sector: 3/3 qc->cursg_ofs: 0x200 qc->cursg->length: 0xd42 equal ? 0 n_elem: 1 orig_n_elem: 1
     kworker/1:3-116     [001] d.h1.   121.445164: ata_pio_sector: 1/3 qc: ffff888008928130 cursg_ofs: 0x200 cursg->offset: 0x0 offset: 0x200 offset_mod: 0x200 sect_size: 0x200
     kworker/1:3-116     [001] d.h1.   121.445168: ata_pio_sector: 2/3 orig_page: ffffea0000263180 page: ffffea0000263180 page_diff: 0x0 nbytes: 0xd42 curbytes: 0x200
     kworker/1:3-116     [001] d.h1.   121.445185: ata_pio_sector: 3/3 qc->cursg_ofs: 0x400 qc->cursg->length: 0xd42 equal ? 0 n_elem: 1 orig_n_elem: 1
     kworker/1:3-116     [001] d.h1.   123.492925: ata_pio_sector: 1/3 qc: ffff888008928130 cursg_ofs: 0x400 cursg->offset: 0x0 offset: 0x400 offset_mod: 0x400 sect_size: 0x200
     kworker/1:3-116     [001] d.h1.   123.492929: ata_pio_sector: 2/3 orig_page: ffffea0000263180 page: ffffea0000263180 page_diff: 0x0 nbytes: 0xd42 curbytes: 0x400
     kworker/1:3-116     [001] d.h1.   123.492943: ata_pio_sector: 3/3 qc->cursg_ofs: 0x600 qc->cursg->length: 0xd42 equal ? 0 n_elem: 1 orig_n_elem: 1
     kworker/1:3-116     [001] d.h1.   123.492990: ata_pio_sector: 1/3 qc: ffff888008928130 cursg_ofs: 0x600 cursg->offset: 0x0 offset: 0x600 offset_mod: 0x600 sect_size: 0x200
     kworker/1:3-116     [001] d.h1.   123.492993: ata_pio_sector: 2/3 orig_page: ffffea0000263180 page: ffffea0000263180 page_diff: 0x0 nbytes: 0xd42 curbytes: 0x600
     kworker/1:3-116     [001] d.h1.   123.493005: ata_pio_sector: 3/3 qc->cursg_ofs: 0x800 qc->cursg->length: 0xd42 equal ? 0 n_elem: 1 orig_n_elem: 1

We can see that qc->cursg_ofs gets incremented by qc->sect_size (0x200),
but since qc->cursg_ofs is never equal to qc->cursg->length (0xd42),
it doesn't stop, and ends up writing way more than qc->cursg->length.



If I compare the function:
/* Transfer qc->sect_size bytes of data from/to the ATA device. */
ata_pio_sector()

with function:
/* __atapi_pio_bytes - Transfer data from/to the ATAPI device. */
__atapi_pio_bytes()


I can see that __atapi_pio_bytes() actually has this code:
/* don't overrun current sg */
count = min(sg->length - qc->cursg_ofs, bytes);


which I guess makes sense, since the function has _bytes_ in the name.
ata_pio_sector() has _sector_ in the name, so it makes sense that it
would have to transfer one or more sectors (even if there currently
doesn't seem to be anything that guarantees this...).


So, should I add a similar:
count = min(sg->length - qc->cursg_ofs, bytes);
to ata_pio_sector(), or should I add code to
ata_pio_sector() that simply rejects a qc->nbytes that is not a multiple
of qc->sect_size ?



I was kind of expecting some upper layer, SCSI or block, to have rejected
an operation that is not a multiple of the sector size.

Is that a silly assumption?


Looking at drivers/scsi/scsi_ioctl.c:sg_scsi_ioctl() I can see that it
rejects:
if (in_len > PAGE_SIZE || out_len > PAGE_SIZE)


If I dump the request in gdb, it seems that:
cmd_flags is: REQ_OP_DRV_IN
__data_len is: 0xd42

(gdb) p /x *((struct request *) ((void*)qc->scsicmd - sizeof(struct request)))
$21 = {q = 0xffff888008d50000, mq_ctx = 0xffff88806d242980, mq_hctx = 0xffff88800781e000, 
  cmd_flags = 0x22, rq_flags = 0x18, tag = 0x0, internal_tag = 0x1, timeout = 0xea60, 
  __data_len = 0xd42, __sector = 0xffffffffffffffff, bio = 0xffff8880088fc700, 
  biotail = 0xffff8880088fc700, {queuelist = {next = 0xffff8880088f12c8, 
      prev = 0xffff8880088f12c8}, rq_next = 0xffff8880088f12c8}, part = 0x0, 
  alloc_time_ns = 0x0, start_time_ns = 0xa837cc389, io_start_time_ns = 0x0, 
  stats_sectors = 0x0, nr_phys_segments = 0x1, nr_integrity_segments = 0x0, state = 0x1, 
  ref = {counter = 0x1}, deadline = 0xfffd2516, {hash = {next = 0x0, pprev = 0x0}, 
    ipi_list = {next = 0x0}}, {rb_node = {__rb_parent_color = 0xffff8880088f1320, 
      rb_right = 0x0, rb_left = 0x0}, special_vec = {bv_page = 0xffff8880088f1320, 
      bv_len = 0x0, bv_offset = 0x0}}, elv = {icq = 0x0, priv = {0xffff88800781f860, 
      0x0}}, flush = {seq = 0x7, saved_end_io = 0x0}, fifo_time = 0xfffc2f88, 
  end_io = 0xffffffff81f55550, end_io_data = 0xffff88800de2fb80}

p /x qc->scsicmd->cmnd[0]
$25 = 0x85
include/scsi/scsi_proto.h:#define       ATA_16                0x85      /* 16-byte pass-thru */


which must have meant that:
in_len == 0, out_len = 0xd42 (3394).

The page size on this system is 4k (4096), so I suppose that out_len is not
greater than PAGE_SIZE.

So I guess that SCSI layer does not reject a request that is not a multiple of
the sector size.

So what is the best solution here?

Should we add such a check in SCSI? libata? or should each (libata) driver
themselves reject such a thing?

Advice is welcome :)


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: out-of-bounds write in the function ata_pio_sector
  2025-01-22 14:59 ` Niklas Cassel
@ 2025-01-29  3:09   ` Martin K. Petersen
  2025-01-29  9:57     ` Niklas Cassel
  0 siblings, 1 reply; 10+ messages in thread
From: Martin K. Petersen @ 2025-01-29  3:09 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Jens Axboe, Christoph Hellwig, Martin K. Petersen, Damien Le Moal,
	linux-ide, reveliofuzzing


Niklas,

Sorry about the delay, was out for a few days.

> I was kind of expecting some upper layer, SCSI or block, to have rejected
> an operation that is not a multiple of the sector size.
>
> Is that a silly assumption?

Not all SCSI commands operate on logical blocks. Plus even if they do
the actual data transfer could still be larger than one block due to PI
or long writes.

That's all a bit theoretical in the context of the archaic
sg_scsi_ioctl() call since that only takes a single page and has little
practical use. But in general we can't assume that everything is a
multiple of 512 bytes.

Your fix looks OK to me.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: out-of-bounds write in the function ata_pio_sector
  2025-01-29  3:09   ` Martin K. Petersen
@ 2025-01-29  9:57     ` Niklas Cassel
  0 siblings, 0 replies; 10+ messages in thread
From: Niklas Cassel @ 2025-01-29  9:57 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, Damien Le Moal, linux-ide,
	reveliofuzzing

Hello Martin,

On Tue, Jan 28, 2025 at 10:09:35PM -0500, Martin K. Petersen wrote:
> 
> Niklas,
> 
> Sorry about the delay, was out for a few days.
> 
> > I was kind of expecting some upper layer, SCSI or block, to have rejected
> > an operation that is not a multiple of the sector size.
> >
> > Is that a silly assumption?
> 
> Not all SCSI commands operate on logical blocks. Plus even if they do
> the actual data transfer could still be larger than one block due to PI
> or long writes.
> 
> That's all a bit theoretical in the context of the archaic
> sg_scsi_ioctl() call since that only takes a single page and has little
> practical use. But in general we can't assume that everything is a
> multiple of 512 bytes.

Thank you.

I basically came to the same conclusion.
(We can't assume that everything is a multiple of 512 bytes.)


Looking at ACS-6, Table A.2 — Command codes (sorted by command code)
and looking at all commands that are of type:
PIO Data-In and PIO Data-Out.

Most commands use the COUNT field to mean a unit in either sectors or log
page count (which is also a multiple of sectors), but some commands, e.g.
TRUSTED RECEIVE 5Ch and TRUSTED SEND 5Eh, it means TRANSFER LENGTH, which
is security protocol specific.

Looking at TCG (SIIS), TRANSFER LENGTH is a multiple of sectors.
I don't know about other security protocols (if any).


It is probably quite safe to make the assumption that the COUNT field in
ACS will always be a multiple of sectors for PIO Data-In and PIO Data-Out
commands, so we probably could add a check in generic libata code
somewhere... but by adding such a check in generic libata code, for it to
be simple, it would probably need to apply to more than just PIO Data-In
and PIO Data-Out commands, and I'm not sure if we can make such an
assumption.

Thus, I'm happy with the fix:
https://lore.kernel.org/linux-ide/20250127154303.15567-2-cassel@kernel.org/
for now.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-01-29  9:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-02  2:17 out-of-bounds write in the function ata_pio_sector reveliofuzzing
2025-01-02 10:40 ` Niklas Cassel
2025-01-02 16:23   ` reveliofuzzing
2025-01-17 14:26     ` Niklas Cassel
2025-01-17 16:42       ` reveliofuzzing
2025-01-20 13:54         ` Niklas Cassel
2025-01-20 16:47           ` reveliofuzzing
2025-01-22 14:59 ` Niklas Cassel
2025-01-29  3:09   ` Martin K. Petersen
2025-01-29  9:57     ` Niklas Cassel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).