CEPH filesystem development
 help / color / mirror / Atom feed
From: Jevon Qiao <scaleqiao@gmail.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	qemu-devel@nongnu.org,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Cc: sage@newdream.net, haomaiwang@gmail.com,
	gkurz@linux.vnet.ibm.com, gfarnum@redhat.com, mst@redhat.com
Subject: Re: [PATCH 2/2] hw/9pfs: fix alignment issue when host filesystem block size is larger than client msize
Date: Fri, 19 Feb 2016 16:56:00 +0800	[thread overview]
Message-ID: <56C6D8A0.2040502@gmail.com> (raw)
In-Reply-To: <87h9h7z9a0.fsf@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 3425 bytes --]

Hi Aneesh,
> I am not sure I understand the details correctly. iounit is the size
> that we use in client_read to determine the  size in which
> we should request I/O from the client. But we still can't do I/O in size
> larger than s->msize. If you look at the client side (kernel 9p fs), you
> will find
>
> 	rsize = fid->iounit;
> 	if (!rsize || rsize > clnt->msize-P9_IOHDRSZ)
> 		rsize = clnt->msize - P9_IOHDRSZ;
Yes, I know this.
> if your iounit calculation ends up zero, that should be handled
> correctly by
>
>      if (!iounit) {
>          iounit = s->msize - P9_IOHDRSZ;
>      }
>      return iounit;
>
>
> So what is the issue here. ?
This will result in an alignment issue while mapping the I/O requested by
client into pages in the function of p9_nr_pages().

    int p9_nr_pages(char *data, int len)
    {
             unsigned long start_page, end_page;
             start_page =  (unsigned long)data >> PAGE_SHIFT;
             end_page = ((unsigned long)data + len + PAGE_SIZE - 1) >>
    PAGE_SHIFT;
             return end_page - start_page;
    }

Please see the following experiment I did without the fix.

1) Start qemu with cephfs,

    $ qemu-system-x86_64 /root/CentOS---6.6-64bit---2015-03-06-a.qcow2
    -smp 4 -m 4096 -fsdev
    cephfs,security_model=passthrough,id=fsdev0,path=/ -device
    virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=cephfs --enable-kvm
    -nographic -net nic -net tap,ifname=tap0,script=no,downscript=no


2) Mount the fs in the guest.

    [root@localhost ~]# mount -t 9p -o trans=virtio,version=9p2000.L
    cephfs /mnt
    [root@localhost ~]# ls -lah /mnt/8kfile
    -rw-r--r-- 1 root root 8.0K 2016-02-19 09:37 /mnt/8kfile

In this case, I used the default msize which is 8192(in Byte). Since cephfs
is using 4M as the f_bsize, the iounit will be 8168 as P9_IOHDRSZ is
equal to 24.

3) Run the following systemtap script to trace the paging result,

    [root@localhost ~]# cat p9_read.stp
    probe kernel.function("p9_virtio_zc_request").call
    {
         printf("p9_virtio_zc_request: inlen size is %d\n", int_arg(5));
    }

    probe kernel.function("p9_nr_pages").call
    {
         printf("p9_nr_pages: start_page = %ld\n", int_arg(1) >> 12);
         printf("p9_nr_pages: end_age = %ld\n", (int_arg(1) + 8168 +
    4096 -1) >> 12);
    }

4) The output I got when I copied out the file /mnt/8kfile to /tmp/ 
directory,

    p9_virtio_zc_request: inlen size is 8168
    p9_nr_pages: start_page = 34293757815
    p9_nr_pages: end_age = 34293757818

Per the text in red(start_page = 34293757815, end_page = 34293757818),
it turns out 8k data will be mapped into three pages. This could hurt the
performance.

Actually, I enabled the cephfs debug functionality added by me to see
how the data is distributed in this case, the result is as follows,

    CEPHFS_DEBUG: cephfs_preadv iov_len=4096
    CEPHFS_DEBUG: cephfs_preadv iov_len=4072
    CEPHFS_DEBUG: cephfs_preadv iov_len=24

This patch aims to fix this. And the result turns out it works quite 
well, all the
data is well aligned.

    p9_virtio_zc_request: inlen size is 4096
    p9_nr_pages: start_page = 34203171814
    p9_nr_pages: end_age = 34203171815
    p9_virtio_zc_request: inlen size is 4096
    p9_nr_pages: start_page = 34203171815
    p9_nr_pages: end_age = 34203171816

    CEPHFS_DEBUG: cephfs_preadv iov_len=4096
    CEPHFS_DEBUG: cephfs_preadv iov_len=4096

Thanks,
Jevon
> -aneesh
>


[-- Attachment #2: Type: text/html, Size: 4740 bytes --]

      reply	other threads:[~2016-02-19  8:56 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-14  7:35 [PATCH 2/2] hw/9pfs: fix alignment issue when host filesystem block size is larger than client msize Jevon Qiao
2016-02-14 13:38 ` Aneesh Kumar K.V
2016-02-17  7:14   ` Jevon Qiao
2016-02-17 10:24     ` Greg Kurz
2016-02-19  9:32       ` Jevon Qiao
2016-02-17 14:44     ` Aneesh Kumar K.V
2016-02-19  8:56       ` Jevon Qiao [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56C6D8A0.2040502@gmail.com \
    --to=scaleqiao@gmail.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gfarnum@redhat.com \
    --cc=gkurz@linux.vnet.ibm.com \
    --cc=haomaiwang@gmail.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox