All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Schoenebeck <qemu_oss@crudebyte.com>
To: qemu-devel@nongnu.org
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	"Venegas Munoz,
	Jose Carlos" <jose.carlos.venegas.munoz@intel.com>,
	virtio-fs-list <virtio-fs@redhat.com>,
	"Shinde, Archana M" <archana.m.shinde@intel.com>,
	Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [Virtio-fs] [PATCH] virtiofsd: Use --thread-pool-size=0 to mean no thread pool
Date: Tue, 17 Nov 2020 17:39:16 +0100	[thread overview]
Message-ID: <5082873.Uf8toegZhN@silver> (raw)
In-Reply-To: <C5CAD300-DB1D-4C6A-A2D3-2E6BA9F901DF@intel.com>

On Dienstag, 17. November 2020 17:00:09 CET Venegas Munoz, Jose Carlos wrote:
> >   Not sure what the default is for 9p, but comparing
> >   default to default will definitely not be apples to apples since this
> >   mode is nonexistent in 9p.
> 
> 
> In Kata we are looking for the best config for fs compatibility and
> performance. So even if are no apples to apples, we are for the best
> config for both and comparing the best that each of them can do. 
> In the case of Kata for 9pfs(this is the config we have found has better
> performance and fs compatibility  in general) we have:
> ```
> -device virtio-9p-pci # device type
> ,disable-modern=false 
> ,fsdev=extra-9p-kataShared # attr: device id  for fsdev
> ,mount_tag=kataShared  # attr: tag on how will be found de sharedfs 
> ,romfile= 
> -fsdev local  #local: Simply lets QEMU call the individual VFS functions
> (more or less) directly on host.
> ,id=extra-9p-kataShared
> ,path=${SHARED_PATH} # attrs: path to share
> ,security_model=none #    
> #    passthrough: Files are stored using the same credentials as they are
> created on the guest. This requires QEMU to run as # root.
> #    none: Same
> as "passthrough" except the sever won't report failures if it fails to set
> file attributes like ownership # #  (chown). This makes a passthrough like
> security model usable for people who run kvm as non root. ,multidevs=remap
> ```
> 
> The mount options are:
> ```
> trans=virtio 
>     ,version=9p2000.L 
>     ,cache=mmap 
>     ,"nodev" # Security: The nodev mount option specifies that the
> filesystem cannot contain special devices. 
> ,"msize=8192" # msize: Maximum

Too small. You should definitely set 'msize' larger, as small a msize value 
causes 9p requests to be broken down into more and smaller 9p requests:
https://wiki.qemu.org/Documentation/9psetup#msize

Setting msize to a fairly high value might also substantially increase I/O 
performance i.e. on large I/O if guest application honors st_blksize returned 
by calling *stat() [ https://man7.org/linux/man-pages/man2/stat.2.html ]:

       st_blksize
              This field gives the "preferred" block size for efficient
              filesystem I/O.

For instance 'time cat /large/file' would be substantially faster, as it 
automatically adapts the chunk size.

> packet size including any headers. ```
> 
> Aditionally we use this patch
> https://github.com/kata-containers/packaging/blob/stable-1.12/qemu/patches/
> 5.0.x/0001-9p-removing-coroutines-of-9p-to-increase-the-I-O-per.patch

Interesting radical approach. :)

Yeah, the coroutine code in 9pfs is still a huge bottleneck. The problem of 
the existing coroutine code is that it dispatches too often between threads 
(i.e. main I/O thread of 9p server and background I/O threads doing the fs 
driver work) which causes huge latencies.

I started to fix that, yet I only completed the Treaddir request handler so 
far, which however gave a huge performance boost on such Treaddir requests:
https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html

The plan is to do the same for the other 9p request handlers, and finally 
introducing an 'order=relaxed' option (right now the order is strict only).

That should end up being faster than your current no-coroutine hack later on, 
because you would have the benefits of both worlds: low latency and 
parallelism for the fs driver I/O work.

> In kata for virtiofs I am testing  use:
> ```
> -chardev socket
> ,id=ID-SOCKET
> ,path=.../vhost-fs.sock  # Path to vhost socket
> -device vhost-user-fs-pci #
> ,chardev=ID-SOCKET 
> ,tag=kataShared 
> ,romfile=
> 
>  -object memory-backend-file # force use of memory sharable with virtiofsd.
>  ,id=dimm1
>  ,size=2048M
>  ,mem-path=/dev/shm 
>  ,share=on
> ```
> Virtiofsd:
> ```
> -o cache=auto 
> -o no_posix_lock # enable/disable remote posix lock
> --thread-pool-size=0
> ```
> 
> And virtiofs mount options:
> ```
> source:\"kataShared\" 
> fstype:\"virtiofs\"
> ```
> 
> With this patch, comparing this two configurations, I have seen better
> performance with virtiofs in different hosts.
> -
> Carlos
> 
> -
> 
> On 12/11/20 3:06, "Miklos Szeredi" <mszeredi@redhat.com> wrote:
> 
>     On Fri, Nov 6, 2020 at 11:35 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> 
>     >
>     >
>     > On Fri, Nov 06, 2020 at 08:33:50PM +0000, Venegas Munoz, Jose Carlos
>     > wrote:
>     > > 
>     > > Hi Vivek,
>     > >
>     > >
>     > >
>     > > I have tested with Kata 1.12-apha0, the results seems that are
>     > > better for the use fio config I am tracking.
>     > > > >
>     > >
>     > >
>     > > The fio config does  randrw:
>     > >
>     > >
>     > >
>     > > fio --direct=1 --gtod_reduce=1 --name=test
>     > > --filename=random_read_write.fio --bs=4k --iodepth=64 --size=200M
>     > > --readwrite=randrw --rwmixread=75
>     > > > >
>     > >
>     >
>     >
>     >
>     > Hi Carlos,
>     >
>     >
>     >
>     > Thanks for the testing.
>     >
>     >
>     >
>     > So basically two conclusions from your tests.
>     >
>     >
>     >
>     > - for virtiofs, --thread-pool-size=0 is performing better as comapred
>     > 
>     >   to --thread-pool-size=1 as well as --thread-pool-size=64.
>     >   Approximately
>     >   35-40% better.
>     >
>     >
>     >
>     > - virtio-9p is still approximately 30% better than virtiofs
>     > 
>     >   --thread-pool-size=0.
>     >
>     >
>     >
>     > As I had done the analysis that this particular workload (mixed read
>     > and
>     > write) is bad with virtiofs because after every write we are
>     > invalidating
>     > attrs and cache so next read ends up fetching attrs again. I had
>     > posted
>     > patches to gain some of the performance.
>     >
>     >
>     >
>     > https://lore.kernel.org/linux-fsdevel/20200929185015.GG220516@redhat.c
>     > om/
>     >
>     >
>     >
>     > But I got the feedback to look into implementing file leases instead.
> 
> 
>     Hmm, the FUSE_AUTO_INVAL_DATA feature is buggy, how about turning it
>     off for now?   9p doesn't have it, so no point in enabling it for
>     virtiofs by default.
> 
>     Also I think some confusion comes from cache=auto being the default
>     for virtiofs.    Not sure what the default is for 9p, but comparing
>     default to default will definitely not be apples to apples since this
>     mode is nonexistent in 9p.
> 
>     9p:cache=none  <-> virtiofs:cache=none
>     9p:cache=loose <-> virtiofs:cache=always
> 
>     "9p:cache=mmap" and "virtiofs:cache=auto" have no match.
> 
>     Untested patch attached.
> 
>     Thanks,
>     Miklos
> 

Best regards,
Christian Schoenebeck




WARNING: multiple messages have this Message-ID (diff)
From: Christian Schoenebeck <qemu_oss@crudebyte.com>
To: qemu-devel@nongnu.org
Cc: "Venegas Munoz,
	Jose Carlos" <jose.carlos.venegas.munoz@intel.com>,
	Miklos Szeredi <mszeredi@redhat.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	virtio-fs-list <virtio-fs@redhat.com>,
	"Shinde, Archana M" <archana.m.shinde@intel.com>,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [Virtio-fs] [PATCH] virtiofsd: Use --thread-pool-size=0 to mean no thread pool
Date: Tue, 17 Nov 2020 17:39:16 +0100	[thread overview]
Message-ID: <5082873.Uf8toegZhN@silver> (raw)
In-Reply-To: <C5CAD300-DB1D-4C6A-A2D3-2E6BA9F901DF@intel.com>

On Dienstag, 17. November 2020 17:00:09 CET Venegas Munoz, Jose Carlos wrote:
> >   Not sure what the default is for 9p, but comparing
> >   default to default will definitely not be apples to apples since this
> >   mode is nonexistent in 9p.
> 
> 
> In Kata we are looking for the best config for fs compatibility and
> performance. So even if are no apples to apples, we are for the best
> config for both and comparing the best that each of them can do. 
> In the case of Kata for 9pfs(this is the config we have found has better
> performance and fs compatibility  in general) we have:
> ```
> -device virtio-9p-pci # device type
> ,disable-modern=false 
> ,fsdev=extra-9p-kataShared # attr: device id  for fsdev
> ,mount_tag=kataShared  # attr: tag on how will be found de sharedfs 
> ,romfile= 
> -fsdev local  #local: Simply lets QEMU call the individual VFS functions
> (more or less) directly on host.
> ,id=extra-9p-kataShared
> ,path=${SHARED_PATH} # attrs: path to share
> ,security_model=none #    
> #    passthrough: Files are stored using the same credentials as they are
> created on the guest. This requires QEMU to run as # root.
> #    none: Same
> as "passthrough" except the sever won't report failures if it fails to set
> file attributes like ownership # #  (chown). This makes a passthrough like
> security model usable for people who run kvm as non root. ,multidevs=remap
> ```
> 
> The mount options are:
> ```
> trans=virtio 
>     ,version=9p2000.L 
>     ,cache=mmap 
>     ,"nodev" # Security: The nodev mount option specifies that the
> filesystem cannot contain special devices. 
> ,"msize=8192" # msize: Maximum

Too small. You should definitely set 'msize' larger, as small a msize value 
causes 9p requests to be broken down into more and smaller 9p requests:
https://wiki.qemu.org/Documentation/9psetup#msize

Setting msize to a fairly high value might also substantially increase I/O 
performance i.e. on large I/O if guest application honors st_blksize returned 
by calling *stat() [ https://man7.org/linux/man-pages/man2/stat.2.html ]:

       st_blksize
              This field gives the "preferred" block size for efficient
              filesystem I/O.

For instance 'time cat /large/file' would be substantially faster, as it 
automatically adapts the chunk size.

> packet size including any headers. ```
> 
> Aditionally we use this patch
> https://github.com/kata-containers/packaging/blob/stable-1.12/qemu/patches/
> 5.0.x/0001-9p-removing-coroutines-of-9p-to-increase-the-I-O-per.patch

Interesting radical approach. :)

Yeah, the coroutine code in 9pfs is still a huge bottleneck. The problem of 
the existing coroutine code is that it dispatches too often between threads 
(i.e. main I/O thread of 9p server and background I/O threads doing the fs 
driver work) which causes huge latencies.

I started to fix that, yet I only completed the Treaddir request handler so 
far, which however gave a huge performance boost on such Treaddir requests:
https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html

The plan is to do the same for the other 9p request handlers, and finally 
introducing an 'order=relaxed' option (right now the order is strict only).

That should end up being faster than your current no-coroutine hack later on, 
because you would have the benefits of both worlds: low latency and 
parallelism for the fs driver I/O work.

> In kata for virtiofs I am testing  use:
> ```
> -chardev socket
> ,id=ID-SOCKET
> ,path=.../vhost-fs.sock  # Path to vhost socket
> -device vhost-user-fs-pci #
> ,chardev=ID-SOCKET 
> ,tag=kataShared 
> ,romfile=
> 
>  -object memory-backend-file # force use of memory sharable with virtiofsd.
>  ,id=dimm1
>  ,size=2048M
>  ,mem-path=/dev/shm 
>  ,share=on
> ```
> Virtiofsd:
> ```
> -o cache=auto 
> -o no_posix_lock # enable/disable remote posix lock
> --thread-pool-size=0
> ```
> 
> And virtiofs mount options:
> ```
> source:\"kataShared\" 
> fstype:\"virtiofs\"
> ```
> 
> With this patch, comparing this two configurations, I have seen better
> performance with virtiofs in different hosts.
> -
> Carlos
> 
> -
> 
> On 12/11/20 3:06, "Miklos Szeredi" <mszeredi@redhat.com> wrote:
> 
>     On Fri, Nov 6, 2020 at 11:35 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> 
>     >
>     >
>     > On Fri, Nov 06, 2020 at 08:33:50PM +0000, Venegas Munoz, Jose Carlos
>     > wrote:
>     > > 
>     > > Hi Vivek,
>     > >
>     > >
>     > >
>     > > I have tested with Kata 1.12-apha0, the results seems that are
>     > > better for the use fio config I am tracking.
>     > > > >
>     > >
>     > >
>     > > The fio config does  randrw:
>     > >
>     > >
>     > >
>     > > fio --direct=1 --gtod_reduce=1 --name=test
>     > > --filename=random_read_write.fio --bs=4k --iodepth=64 --size=200M
>     > > --readwrite=randrw --rwmixread=75
>     > > > >
>     > >
>     >
>     >
>     >
>     > Hi Carlos,
>     >
>     >
>     >
>     > Thanks for the testing.
>     >
>     >
>     >
>     > So basically two conclusions from your tests.
>     >
>     >
>     >
>     > - for virtiofs, --thread-pool-size=0 is performing better as comapred
>     > 
>     >   to --thread-pool-size=1 as well as --thread-pool-size=64.
>     >   Approximately
>     >   35-40% better.
>     >
>     >
>     >
>     > - virtio-9p is still approximately 30% better than virtiofs
>     > 
>     >   --thread-pool-size=0.
>     >
>     >
>     >
>     > As I had done the analysis that this particular workload (mixed read
>     > and
>     > write) is bad with virtiofs because after every write we are
>     > invalidating
>     > attrs and cache so next read ends up fetching attrs again. I had
>     > posted
>     > patches to gain some of the performance.
>     >
>     >
>     >
>     > https://lore.kernel.org/linux-fsdevel/20200929185015.GG220516@redhat.c
>     > om/
>     >
>     >
>     >
>     > But I got the feedback to look into implementing file leases instead.
> 
> 
>     Hmm, the FUSE_AUTO_INVAL_DATA feature is buggy, how about turning it
>     off for now?   9p doesn't have it, so no point in enabling it for
>     virtiofs by default.
> 
>     Also I think some confusion comes from cache=auto being the default
>     for virtiofs.    Not sure what the default is for 9p, but comparing
>     default to default will definitely not be apples to apples since this
>     mode is nonexistent in 9p.
> 
>     9p:cache=none  <-> virtiofs:cache=none
>     9p:cache=loose <-> virtiofs:cache=always
> 
>     "9p:cache=mmap" and "virtiofs:cache=auto" have no match.
> 
>     Untested patch attached.
> 
>     Thanks,
>     Miklos
> 

Best regards,
Christian Schoenebeck




  reply	other threads:[~2020-11-17 16:39 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-05 19:44 [Virtio-fs] [PATCH] virtiofsd: Use --thread-pool-size=0 to mean no thread pool Vivek Goyal
2020-11-05 19:44 ` Vivek Goyal
2020-11-05 19:52 ` [Virtio-fs] " Vivek Goyal
2020-11-05 19:52   ` Vivek Goyal
2020-11-06 20:33   ` Venegas Munoz, Jose Carlos
2020-11-06 20:33     ` Venegas Munoz, Jose Carlos
2020-11-06 22:35     ` Vivek Goyal
2020-11-06 22:35       ` Vivek Goyal
2020-11-09 10:02       ` Dr. David Alan Gilbert
2020-11-09 10:02         ` Dr. David Alan Gilbert
2020-11-09 14:39         ` Vivek Goyal
2020-11-09 14:39           ` Vivek Goyal
2020-11-12  9:06       ` Miklos Szeredi
2020-11-12  9:57         ` Miklos Szeredi
2020-11-18 20:19           ` Vivek Goyal
2020-11-12 11:00         ` Christian Schoenebeck
2020-11-12 11:00           ` Christian Schoenebeck
2020-11-12 13:01           ` Miklos Szeredi
2020-11-17 16:00         ` Venegas Munoz, Jose Carlos
2020-11-17 16:39           ` Christian Schoenebeck [this message]
2020-11-17 16:39             ` Christian Schoenebeck
2020-11-17 18:55           ` Vivek Goyal
2020-11-17 18:55             ` Vivek Goyal
2020-11-23 15:06             ` Venegas Munoz, Jose Carlos
2020-11-23 15:06               ` Venegas Munoz, Jose Carlos
2020-11-17 22:24           ` Vivek Goyal
2020-11-17 22:24             ` Vivek Goyal
2020-11-23 15:07             ` Venegas Munoz, Jose Carlos
2020-11-23 15:07               ` Venegas Munoz, Jose Carlos
2020-11-18 19:51         ` Vivek Goyal
2020-11-23  9:46           ` Miklos Szeredi
2020-11-23  9:46             ` Miklos Szeredi
2020-11-09 10:11 ` Dr. David Alan Gilbert
2020-11-09 10:11   ` Dr. David Alan Gilbert
2020-11-09 14:40   ` [Virtio-fs] " Vivek Goyal
2020-11-09 14:40     ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5082873.Uf8toegZhN@silver \
    --to=qemu_oss@crudebyte.com \
    --cc=archana.m.shinde@intel.com \
    --cc=jose.carlos.venegas.munoz@intel.com \
    --cc=miklos@szeredi.hu \
    --cc=qemu-devel@nongnu.org \
    --cc=vgoyal@redhat.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.