From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jevon Qiao Subject: Re: [PATCH 2/2] hw/9pfs: fix alignment issue when host filesystem block size is larger than client msize Date: Fri, 19 Feb 2016 16:56:00 +0800 Message-ID: <56C6D8A0.2040502@gmail.com> References: <56C02E4F.6030303@gmail.com> <87y4an8l93.fsf@linux.vnet.ibm.com> <56C41DE8.2030904@gmail.com> <87h9h7z9a0.fsf@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="------------030405040304090102000101" Return-path: In-Reply-To: <87h9h7z9a0.fsf@linux.vnet.ibm.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org To: "Aneesh Kumar K.V" , qemu-devel@nongnu.org, "ceph-devel@vger.kernel.org" Cc: sage@newdream.net, haomaiwang@gmail.com, gkurz@linux.vnet.ibm.com, gfarnum@redhat.com, mst@redhat.com List-Id: ceph-devel.vger.kernel.org This is a multi-part message in MIME format. --------------030405040304090102000101 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hi Aneesh, > I am not sure I understand the details correctly. iounit is the size > that we use in client_read to determine the size in which > we should request I/O from the client. But we still can't do I/O in size > larger than s->msize. If you look at the client side (kernel 9p fs), you > will find > > rsize = fid->iounit; > if (!rsize || rsize > clnt->msize-P9_IOHDRSZ) > rsize = clnt->msize - P9_IOHDRSZ; Yes, I know this. > if your iounit calculation ends up zero, that should be handled > correctly by > > if (!iounit) { > iounit = s->msize - P9_IOHDRSZ; > } > return iounit; > > > So what is the issue here. ? This will result in an alignment issue while mapping the I/O requested by client into pages in the function of p9_nr_pages(). int p9_nr_pages(char *data, int len) { unsigned long start_page, end_page; start_page = (unsigned long)data >> PAGE_SHIFT; end_page = ((unsigned long)data + len + PAGE_SIZE - 1) >> PAGE_SHIFT; return end_page - start_page; } Please see the following experiment I did without the fix. 1) Start qemu with cephfs, $ qemu-system-x86_64 /root/CentOS---6.6-64bit---2015-03-06-a.qcow2 -smp 4 -m 4096 -fsdev cephfs,security_model=passthrough,id=fsdev0,path=/ -device virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=cephfs --enable-kvm -nographic -net nic -net tap,ifname=tap0,script=no,downscript=no 2) Mount the fs in the guest. [root@localhost ~]# mount -t 9p -o trans=virtio,version=9p2000.L cephfs /mnt [root@localhost ~]# ls -lah /mnt/8kfile -rw-r--r-- 1 root root 8.0K 2016-02-19 09:37 /mnt/8kfile In this case, I used the default msize which is 8192(in Byte). Since cephfs is using 4M as the f_bsize, the iounit will be 8168 as P9_IOHDRSZ is equal to 24. 3) Run the following systemtap script to trace the paging result, [root@localhost ~]# cat p9_read.stp probe kernel.function("p9_virtio_zc_request").call { printf("p9_virtio_zc_request: inlen size is %d\n", int_arg(5)); } probe kernel.function("p9_nr_pages").call { printf("p9_nr_pages: start_page = %ld\n", int_arg(1) >> 12); printf("p9_nr_pages: end_age = %ld\n", (int_arg(1) + 8168 + 4096 -1) >> 12); } 4) The output I got when I copied out the file /mnt/8kfile to /tmp/ directory, p9_virtio_zc_request: inlen size is 8168 p9_nr_pages: start_page = 34293757815 p9_nr_pages: end_age = 34293757818 Per the text in red(start_page = 34293757815, end_page = 34293757818), it turns out 8k data will be mapped into three pages. This could hurt the performance. Actually, I enabled the cephfs debug functionality added by me to see how the data is distributed in this case, the result is as follows, CEPHFS_DEBUG: cephfs_preadv iov_len=4096 CEPHFS_DEBUG: cephfs_preadv iov_len=4072 CEPHFS_DEBUG: cephfs_preadv iov_len=24 This patch aims to fix this. And the result turns out it works quite well, all the data is well aligned. p9_virtio_zc_request: inlen size is 4096 p9_nr_pages: start_page = 34203171814 p9_nr_pages: end_age = 34203171815 p9_virtio_zc_request: inlen size is 4096 p9_nr_pages: start_page = 34203171815 p9_nr_pages: end_age = 34203171816 CEPHFS_DEBUG: cephfs_preadv iov_len=4096 CEPHFS_DEBUG: cephfs_preadv iov_len=4096 Thanks, Jevon > -aneesh > --------------030405040304090102000101 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit Hi Aneesh,
I am not sure I understand the details correctly. iounit is the size
that we use in client_read to determine the  size in which
we should request I/O from the client. But we still can't do I/O in size
larger than s->msize. If you look at the client side (kernel 9p fs), you
will find

	rsize = fid->iounit;
	if (!rsize || rsize > clnt->msize-P9_IOHDRSZ)
		rsize = clnt->msize - P9_IOHDRSZ;
Yes, I know this.
if your iounit calculation ends up zero, that should be handled
correctly by

    if (!iounit) {
        iounit = s->msize - P9_IOHDRSZ;
    }
    return iounit;


So what is the issue here. ?
This will result in an alignment issue while mapping the I/O requested by
client into pages in the function of p9_nr_pages().
int p9_nr_pages(char *data, int len)
{
        unsigned long start_page, end_page;
        start_page =  (unsigned long)data >> PAGE_SHIFT;
        end_page = ((unsigned long)data + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
        return end_page - start_page;
}
Please see the following experiment I did without the fix.

1) Start qemu with cephfs,
$ qemu-system-x86_64 /root/CentOS---6.6-64bit---2015-03-06-a.qcow2 -smp 4 -m 4096 -fsdev cephfs,security_model=passthrough,id=fsdev0,path=/ -device virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=cephfs --enable-kvm -nographic -net nic -net tap,ifname=tap0,script=no,downscript=no

2) Mount the fs in the guest.
[root@localhost ~]# mount -t 9p -o trans=virtio,version=9p2000.L cephfs /mnt
[root@localhost ~]# ls -lah /mnt/8kfile
-rw-r--r-- 1 root root 8.0K 2016-02-19 09:37 /mnt/8kfile
In this case, I used the default msize which is 8192(in Byte). Since cephfs
is using 4M as the f_bsize, the iounit will be 8168 as P9_IOHDRSZ is
equal to 24.

3) Run the following systemtap script to trace the paging result,
[root@localhost ~]# cat p9_read.stp
probe kernel.function("p9_virtio_zc_request").call
{
    printf("p9_virtio_zc_request: inlen size is %d\n", int_arg(5));
}

probe kernel.function("p9_nr_pages").call
{
    printf("p9_nr_pages: start_page = %ld\n", int_arg(1) >> 12);
    printf("p9_nr_pages: end_age = %ld\n", (int_arg(1) + 8168 + 4096 -1) >> 12);
}
4) The output I got when I copied out the file /mnt/8kfile to /tmp/ directory,
p9_virtio_zc_request: inlen size is 8168
p9_nr_pages: start_page = 34293757815
p9_nr_pages: end_age = 34293757818

Per the text in red(start_page = 34293757815, end_page = 34293757818),
it turns out 8k data will be mapped into three pages. This could hurt the
performance.

Actually, I enabled the cephfs debug functionality added by me to see
how the data is distributed in this case, the result is as follows,
CEPHFS_DEBUG: cephfs_preadv iov_len=4096
CEPHFS_DEBUG: cephfs_preadv iov_len=4072
CEPHFS_DEBUG: cephfs_preadv iov_len=24
This patch aims to fix this. And the result turns out it works quite well, all the
data is well aligned.
p9_virtio_zc_request: inlen size is 4096
p9_nr_pages: start_page = 34203171814
p9_nr_pages: end_age = 34203171815
p9_virtio_zc_request: inlen size is 4096
p9_nr_pages: start_page = 34203171815
p9_nr_pages: end_age = 34203171816

CEPHFS_DEBUG: cephfs_preadv iov_len=4096
CEPHFS_DEBUG: cephfs_preadv iov_len=4096

Thanks,
Jevon
-aneesh


--------------030405040304090102000101--