From: Jevon Qiao <scaleqiao@gmail.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
qemu-devel@nongnu.org,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Cc: sage@newdream.net, haomaiwang@gmail.com,
gkurz@linux.vnet.ibm.com, gfarnum@redhat.com, mst@redhat.com
Subject: Re: [PATCH 2/2] hw/9pfs: fix alignment issue when host filesystem block size is larger than client msize
Date: Fri, 19 Feb 2016 16:56:00 +0800 [thread overview]
Message-ID: <56C6D8A0.2040502@gmail.com> (raw)
In-Reply-To: <87h9h7z9a0.fsf@linux.vnet.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 3425 bytes --]
Hi Aneesh,
> I am not sure I understand the details correctly. iounit is the size
> that we use in client_read to determine the size in which
> we should request I/O from the client. But we still can't do I/O in size
> larger than s->msize. If you look at the client side (kernel 9p fs), you
> will find
>
> rsize = fid->iounit;
> if (!rsize || rsize > clnt->msize-P9_IOHDRSZ)
> rsize = clnt->msize - P9_IOHDRSZ;
Yes, I know this.
> if your iounit calculation ends up zero, that should be handled
> correctly by
>
> if (!iounit) {
> iounit = s->msize - P9_IOHDRSZ;
> }
> return iounit;
>
>
> So what is the issue here. ?
This will result in an alignment issue while mapping the I/O requested by
client into pages in the function of p9_nr_pages().
int p9_nr_pages(char *data, int len)
{
unsigned long start_page, end_page;
start_page = (unsigned long)data >> PAGE_SHIFT;
end_page = ((unsigned long)data + len + PAGE_SIZE - 1) >>
PAGE_SHIFT;
return end_page - start_page;
}
Please see the following experiment I did without the fix.
1) Start qemu with cephfs,
$ qemu-system-x86_64 /root/CentOS---6.6-64bit---2015-03-06-a.qcow2
-smp 4 -m 4096 -fsdev
cephfs,security_model=passthrough,id=fsdev0,path=/ -device
virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=cephfs --enable-kvm
-nographic -net nic -net tap,ifname=tap0,script=no,downscript=no
2) Mount the fs in the guest.
[root@localhost ~]# mount -t 9p -o trans=virtio,version=9p2000.L
cephfs /mnt
[root@localhost ~]# ls -lah /mnt/8kfile
-rw-r--r-- 1 root root 8.0K 2016-02-19 09:37 /mnt/8kfile
In this case, I used the default msize which is 8192(in Byte). Since cephfs
is using 4M as the f_bsize, the iounit will be 8168 as P9_IOHDRSZ is
equal to 24.
3) Run the following systemtap script to trace the paging result,
[root@localhost ~]# cat p9_read.stp
probe kernel.function("p9_virtio_zc_request").call
{
printf("p9_virtio_zc_request: inlen size is %d\n", int_arg(5));
}
probe kernel.function("p9_nr_pages").call
{
printf("p9_nr_pages: start_page = %ld\n", int_arg(1) >> 12);
printf("p9_nr_pages: end_age = %ld\n", (int_arg(1) + 8168 +
4096 -1) >> 12);
}
4) The output I got when I copied out the file /mnt/8kfile to /tmp/
directory,
p9_virtio_zc_request: inlen size is 8168
p9_nr_pages: start_page = 34293757815
p9_nr_pages: end_age = 34293757818
Per the text in red(start_page = 34293757815, end_page = 34293757818),
it turns out 8k data will be mapped into three pages. This could hurt the
performance.
Actually, I enabled the cephfs debug functionality added by me to see
how the data is distributed in this case, the result is as follows,
CEPHFS_DEBUG: cephfs_preadv iov_len=4096
CEPHFS_DEBUG: cephfs_preadv iov_len=4072
CEPHFS_DEBUG: cephfs_preadv iov_len=24
This patch aims to fix this. And the result turns out it works quite
well, all the
data is well aligned.
p9_virtio_zc_request: inlen size is 4096
p9_nr_pages: start_page = 34203171814
p9_nr_pages: end_age = 34203171815
p9_virtio_zc_request: inlen size is 4096
p9_nr_pages: start_page = 34203171815
p9_nr_pages: end_age = 34203171816
CEPHFS_DEBUG: cephfs_preadv iov_len=4096
CEPHFS_DEBUG: cephfs_preadv iov_len=4096
Thanks,
Jevon
> -aneesh
>
[-- Attachment #2: Type: text/html, Size: 4740 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: Jevon Qiao <scaleqiao@gmail.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
qemu-devel@nongnu.org,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Cc: sage@newdream.net, haomaiwang@gmail.com,
gkurz@linux.vnet.ibm.com, gfarnum@redhat.com, mst@redhat.com
Subject: Re: [Qemu-devel] [PATCH 2/2] hw/9pfs: fix alignment issue when host filesystem block size is larger than client msize
Date: Fri, 19 Feb 2016 16:56:00 +0800 [thread overview]
Message-ID: <56C6D8A0.2040502@gmail.com> (raw)
In-Reply-To: <87h9h7z9a0.fsf@linux.vnet.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 3425 bytes --]
Hi Aneesh,
> I am not sure I understand the details correctly. iounit is the size
> that we use in client_read to determine the size in which
> we should request I/O from the client. But we still can't do I/O in size
> larger than s->msize. If you look at the client side (kernel 9p fs), you
> will find
>
> rsize = fid->iounit;
> if (!rsize || rsize > clnt->msize-P9_IOHDRSZ)
> rsize = clnt->msize - P9_IOHDRSZ;
Yes, I know this.
> if your iounit calculation ends up zero, that should be handled
> correctly by
>
> if (!iounit) {
> iounit = s->msize - P9_IOHDRSZ;
> }
> return iounit;
>
>
> So what is the issue here. ?
This will result in an alignment issue while mapping the I/O requested by
client into pages in the function of p9_nr_pages().
int p9_nr_pages(char *data, int len)
{
unsigned long start_page, end_page;
start_page = (unsigned long)data >> PAGE_SHIFT;
end_page = ((unsigned long)data + len + PAGE_SIZE - 1) >>
PAGE_SHIFT;
return end_page - start_page;
}
Please see the following experiment I did without the fix.
1) Start qemu with cephfs,
$ qemu-system-x86_64 /root/CentOS---6.6-64bit---2015-03-06-a.qcow2
-smp 4 -m 4096 -fsdev
cephfs,security_model=passthrough,id=fsdev0,path=/ -device
virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=cephfs --enable-kvm
-nographic -net nic -net tap,ifname=tap0,script=no,downscript=no
2) Mount the fs in the guest.
[root@localhost ~]# mount -t 9p -o trans=virtio,version=9p2000.L
cephfs /mnt
[root@localhost ~]# ls -lah /mnt/8kfile
-rw-r--r-- 1 root root 8.0K 2016-02-19 09:37 /mnt/8kfile
In this case, I used the default msize which is 8192(in Byte). Since cephfs
is using 4M as the f_bsize, the iounit will be 8168 as P9_IOHDRSZ is
equal to 24.
3) Run the following systemtap script to trace the paging result,
[root@localhost ~]# cat p9_read.stp
probe kernel.function("p9_virtio_zc_request").call
{
printf("p9_virtio_zc_request: inlen size is %d\n", int_arg(5));
}
probe kernel.function("p9_nr_pages").call
{
printf("p9_nr_pages: start_page = %ld\n", int_arg(1) >> 12);
printf("p9_nr_pages: end_age = %ld\n", (int_arg(1) + 8168 +
4096 -1) >> 12);
}
4) The output I got when I copied out the file /mnt/8kfile to /tmp/
directory,
p9_virtio_zc_request: inlen size is 8168
p9_nr_pages: start_page = 34293757815
p9_nr_pages: end_age = 34293757818
Per the text in red(start_page = 34293757815, end_page = 34293757818),
it turns out 8k data will be mapped into three pages. This could hurt the
performance.
Actually, I enabled the cephfs debug functionality added by me to see
how the data is distributed in this case, the result is as follows,
CEPHFS_DEBUG: cephfs_preadv iov_len=4096
CEPHFS_DEBUG: cephfs_preadv iov_len=4072
CEPHFS_DEBUG: cephfs_preadv iov_len=24
This patch aims to fix this. And the result turns out it works quite
well, all the
data is well aligned.
p9_virtio_zc_request: inlen size is 4096
p9_nr_pages: start_page = 34203171814
p9_nr_pages: end_age = 34203171815
p9_virtio_zc_request: inlen size is 4096
p9_nr_pages: start_page = 34203171815
p9_nr_pages: end_age = 34203171816
CEPHFS_DEBUG: cephfs_preadv iov_len=4096
CEPHFS_DEBUG: cephfs_preadv iov_len=4096
Thanks,
Jevon
> -aneesh
>
[-- Attachment #2: Type: text/html, Size: 4740 bytes --]
next prev parent reply other threads:[~2016-02-19 8:56 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-14 7:35 [PATCH 2/2] hw/9pfs: fix alignment issue when host filesystem block size is larger than client msize Jevon Qiao
2016-02-14 7:35 ` [Qemu-devel] " Jevon Qiao
2016-02-14 13:38 ` Aneesh Kumar K.V
2016-02-14 13:38 ` [Qemu-devel] " Aneesh Kumar K.V
2016-02-17 7:14 ` Jevon Qiao
2016-02-17 7:14 ` [Qemu-devel] " Jevon Qiao
2016-02-17 10:24 ` Greg Kurz
2016-02-17 10:24 ` [Qemu-devel] " Greg Kurz
2016-02-19 9:32 ` Jevon Qiao
2016-02-19 9:32 ` [Qemu-devel] " Jevon Qiao
2016-02-17 14:44 ` Aneesh Kumar K.V
2016-02-17 14:44 ` [Qemu-devel] " Aneesh Kumar K.V
2016-02-19 8:56 ` Jevon Qiao [this message]
2016-02-19 8:56 ` Jevon Qiao
2016-02-24 7:04 ` Jevon Qiao
2016-03-03 3:00 ` Jevon Qiao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56C6D8A0.2040502@gmail.com \
--to=scaleqiao@gmail.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=ceph-devel@vger.kernel.org \
--cc=gfarnum@redhat.com \
--cc=gkurz@linux.vnet.ibm.com \
--cc=haomaiwang@gmail.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=sage@newdream.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.