From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 22 Oct 2020 13:59:32 -0700 From: Eric Biggers Subject: Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c" Message-ID: <20201022205932.GB3613750@gmail.com> References: <20201022090155.GA1483166@kroah.com> <5fd6003b-55a6-2c3c-9a28-8fd3a575ca78@redhat.com> <20201022132342.GB8781@lst.de> <8f1fff0c358b4b669d51cc80098dbba1@AcuMS.aculab.com> <20201022164040.GV20115@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-ID: To: Nick Desaulniers Cc: Matthew Wilcox , David Laight , Christoph Hellwig , David Hildenbrand , Greg KH , Al Viro , "kernel-team@android.com" , Andrew Morton , Jens Axboe , Arnd Bergmann , David Howells , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" , "linux-mips@vger.kernel.org" , "linux-parisc@vger.kernel.org" , "linuxppc-dev@lists.ozlabs.org" , "linux-s390@vger.kernel.org" , "sparclinux@vger.kernel.org" , "linux-block@vger.kernel.org" , "linux-scsi@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-aio@kvack.org" , "io-uring@vger.kernel.org" , "linux-arch@vger.kernel.org" , "linux-mm@kvack.org" , "netdev@vger.kernel.org" , "keyrings@vger.kernel.org" , "linux-security-module@vger.kernel.org" On Thu, Oct 22, 2020 at 10:00:44AM -0700, Nick Desaulniers wrote: > On Thu, Oct 22, 2020 at 9:40 AM Matthew Wilcox wrote: > > > > On Thu, Oct 22, 2020 at 04:35:17PM +0000, David Laight wrote: > > > Wait... > > > readv(2) defines: > > > ssize_t readv(int fd, const struct iovec *iov, int iovcnt); > > > > It doesn't really matter what the manpage says. What does the AOSP > > libc header say? > > Same: https://android.googlesource.com/platform/bionic/+/refs/heads/master/libc/include/sys/uio.h#38 > > Theoretically someone could bypass libc to make a system call, right? > > > > > > But the syscall is defined as: > > > > > > SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec, > > > unsigned long, vlen) > > > { > > > return do_readv(fd, vec, vlen, 0); > > > } > > > FWIW, glibc makes the readv() syscall assuming that fd and vlen are 'int' as well. So this problem isn't specific to Android's libc. >From objdump -d /lib/x86_64-linux-gnu/libc.so.6: 00000000000f4db0 : f4db0: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax f4db7: 00 f4db8: 85 c0 test %eax,%eax f4dba: 75 14 jne f4dd0 f4dbc: b8 13 00 00 00 mov $0x13,%eax f4dc1: 0f 05 syscall ... There's some code for pthread cancellation, but no zeroing of the upper half of the fd and vlen arguments, which are in %edi and %edx respectively. But the glibc function prototype uses 'int' for them, not 'unsigned long' 'ssize_t readv(int fd, const struct iovec *iov, int iovcnt);'. So the high halves of the fd and iovcnt registers can contain garbage. Or at least that's what gcc (9.3.0) and clang (9.0.1) assume; they both compile the following void g(unsigned int x); void f(unsigned long x) { g(x); } into f() making a tail call to g(), without zeroing the top half of %rdi. Also note the following program succeeds on Linux 5.9 on x86_64. On kernels that have this bug, it should fail. (I couldn't get it to actually fail, so it must depend on the compiler and/or the kernel config...) #include #include #include #include #include int main() { int fd = open("/dev/zero", O_RDONLY); char buf[1000]; struct iovec iov = { .iov_base = buf, .iov_len = sizeof(buf) }; long ret; ret = syscall(__NR_readv, fd, &iov, 0x100000001); if (ret < 0) perror("readv failed"); else printf("read %ld bytes\n", ret); } I think the right fix is to change the readv() (and writev(), etc.) syscalls to take 'unsigned int' rather than 'unsigned long', as that is what the users are assuming... - Eric