From mboxrd@z Thu Jan 1 00:00:00 1970 From: Al Viro Subject: Re: [GIT PULL] Orangefs (text only resend) Date: Sun, 6 Sep 2015 21:20:19 +0100 Message-ID: <20150906202019.GJ22011@ZenIV.linux.org.uk> References: <20150906063552.GA7224@lst.de> <20150906090838.GI22011@ZenIV.linux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Christoph Hellwig , Linus Torvalds , linux-fsdevel , Andrew Morton , Stephen Rothwell , Boaz Harrosh , Greg Kroah-Hartman To: Mike Marshall Return-path: Received: from zeniv.linux.org.uk ([195.92.253.2]:39247 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752826AbbIFUU3 (ORCPT ); Sun, 6 Sep 2015 16:20:29 -0400 Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sun, Sep 06, 2015 at 10:52:16AM -0400, Mike Marshall wrote: > int pvfs_bufmap_copy_to_user_iovec2(struct pvfs2_bufmap *bufmap, > struct iov_iter *iter, > size_t total_size, > int buffer_index) > { > struct pvfs_bufmap_desc *from; > int ret = 0; void *from_kaddr = NULL; > from = &bufmap->desc_array[buffer_index]; > > from_kaddr = pvfs2_kmap(from->page_array[0]); > ret = copy_to_iter(from_kaddr, total_size, iter); > return ret; > } Wrappers Are Evil: pvfs2_kmap(). It obfuscates the things for no reason whatsoever, is actively wrong for a lot of reasons (kmap() slots are system-wide resoure, and not too plentiful one, at that) _and_ obfuscates the open-coding of existing primitive, badly: copy_page_to_iter() does the same thing with less headache for caller and without a leak (you forgot kunmap the damn thing afterwards). > "cat filename" is a simple test that drives execution > down the "Are we copying to User Virtual Addresses?" > code path, and big and small files still copy > out. > > I haven't found a test case that drives execution down > the "Are we copying to Kern Virtual Addresses?" code > path yet. Ummm... GrepTFS? pvfs_bufmap_copy_to_kernel_iovec() called from postcopy_buffers() when the last argument is 0, which is called from wait_for_direct_io() in the same conditions plus the first argument being PVFS_IO_READ. Said function has only two callers, one in do_readv_writev() (the last argument is 1, out of consideration), another in pvfs2_inode_read(). Which does + ret = wait_for_direct_io(PVFS_IO_READ, inode, offset, &vec, 1, + count, readahead_size, 0); which should hit pvfs_bufmap_copy_to_kernel_iovec() just fine... Incidentally, it's misannotated - +ssize_t pvfs2_inode_read(struct inode *inode, + char __user *buf, + size_t count, + loff_t *offset, + loff_t readahead_size) +{ claims buf to be a userland pointer, while its only caller is passing kmap(some page) there. This bytes_read = pvfs2_inode_read(inode, - page_data, + (char __user *) page_data, blocksize, &blockptr_offset, in later commit forces sparce to STFU, but the point of __user annotations is to be able to keep track which pointers are which, after all... Oh, and to continue GTFS, the only caller of pvfs2_inode_read() appears to be read_one_page(), called from pvfs2_readpage() and pvfs2_readpages(), which are ->readpage() and ->readpages() methods in pvfs2_address_operations. So I'd suggest trying to mmap and dereference. Or execve() of a binary there... BTW, this + /* map the pages */ + down_write(¤t->mm->mmap_sem); + ret = get_user_pages(current, + current->mm, + (unsigned long)user_desc->ptr, + bufmap->page_count, + 1, + 0, + bufmap->page_array, + NULL); + up_write(¤t->mm->mmap_sem); should almost certainly be replaced with ret = get_user_pages_fast((unsigned long)user_desc->ptr, bufmap->page_count, 1,bufmap->page_array); and let it deal with ->mmap_sem.