Re: Performance Issue with CXL-emulation

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* Re: Performance Issue with CXL-emulation
       [not found] <CAKJOkCoxLG01Dt7xMjOPWRqhyLPuaNGRUaDn-sgAFfhERtAYJA@mail.gmail.com>
@ 2023-10-16  9:55 ` Jonathan Cameron via
  2023-10-16  9:55   ` Jonathan Cameron
                     ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Jonathan Cameron via @ 2023-10-16  9:55 UTC (permalink / raw)
  To: lokesh jaliminche; +Cc: qemu-devel-request, linux-cxl, qemu-devel

On Sun, 15 Oct 2023 10:39:46 -0700
lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:

> Hi Everyone,
> 
> I am facing performance issues while copying data to the CXL device
> (Emulated with QEMU). I get approximately 500KB/Sec. Any suggestion on how
> to improve this?

Hi Lokesh,

The target so far of QEMU emulation of CXL devices has been on functionality.
I'm in favour of work to improve on that, but it isn't likely to be my focus
- can offer some pointers on where to look though!

The fundamental problem (probably) is address decoding in CXL for interleaving
is at a sub page granularity. That means we can't use page table to perform the address
look ups in hardware. Note this also has the side effect that kvm won't work if
there is any chance that you will run instructions out of the CXL memory - it's
fine if you are interested in data only (DAX etc). (I've had a note in my todo list
to add a warning message about the KVM limitations for a while).

There have been a few discussions (mostly when we were debugging some TCG issues
and considering KVM support) about how we 'might' be able to improve this.  That focused
on a general 'fix', but there may be some lower hanging fruit.

The options I think might work are:

1) Special case configurations where there is no interleave going on.
   I'm not entirely sure how this would fit together and it won't deal with the
   more interesting cases - if it does work I'd want it to be minimally invasive because
   those complex cases are the main focus of testing etc.  There is an extension of this
   where we handle interleave, but only if it is 4k or above (on appropriately configured
   host).

2) Add caching layer to the CXL fixed memory windows.  That would hold copies of a
   number of pages that have been accessed in a software cache and setup the mappings for
   the hardware page table walkers to find them. If the page isn't cached we'd trigger
   a pagefault and have to bring it into the cache. If the configuration of the interleave
   is touched, all caches would need to be written back etc. This would need to be optional
   because I don't want to have to add cache coherency protocols etc when we add shared
   memory support (fun though it would be ;) 

3) Might be worth looking at the critical paths for lookups in your configuration.
   Maybe we can optimize the address decoders (basically a software TLB for HPA to DPA).
   I've not looked at the performance of those paths.  For your example the lookup is
   * CFMWS - nothing to do
   * Host bridge - nothing to do beyond a sanity check on range I think.
   * Nothing to to do.
   * Type 3 device - basic range match.
   So I'm not sure it is worth while - but you could do a really simple test by detecting
   no interleave is going on and caching the offset needed to go HPA to DPA + a device reference
   for the first time cxl_cfmws_find_device() is called. 
   https://elixir.bootlin.com/qemu/latest/source/hw/cxl/cxl-host.c#L129

   Then just match on hwaddr on another call of cxl_cmws_find_device() and return the device
   directly.  Maybe also shortcut lookups in cxl_type3_hpa_to_as_and_dpa() which does the endpoint
   decoding part. A quick hack would let you know if it was worth looking at something more general.

   Gut feeling is this last approach might get you some perf uptick but not going to solve
   the fundamental problem that in general we can't do the translation in hardware (unlike most
   other memory accesses in QEMU).

   Not I believe all writes to file backed memory will go all the way to the file. So you might want
   to try backing it with RAM but I as with the above, that's not going to address the fundamental
   problem.

Jonathan

> 
> Steps to reproduce :
> ===============
> 1. QEMU Command:
> sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
> -hda ./images/ubuntu-22.04-server-cloudimg-amd64.img \
> -hdb ./images/user-data.img \
> -M q35,cxl=on,accel=kvm,nvdimm=on \
> -smp 16 \
> -m 16G,maxmem=32G,slots=8 \
> -object
> memory-backend-file,id=cxl-mem1,share=on,mem-path=/mnt/qemu_files/cxltest.raw,size=256M
> \
> -object
> memory-backend-file,id=cxl-lsa1,share=on,mem-path=/mnt/qemu_files/lsa.raw,size=256M
> \
> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> -device
> cxl-type3,bus=root_port13,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0
> \
> -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G \
> -nographic \
> 
> 2. Configure device with fsdax mode
> ubuntu@ubuntu:~$ cxl list
> [
>   {
>     "memdevs":[
>       {
>         "memdev":"mem0",
>         "pmem_size":268435456,
>         "serial":0,
>         "host":"0000:0d:00.0"
>       }
>     ]
>   },
>   {
>     "regions":[
>       {
>         "region":"region0",
>         "resource":45365592064,
>         "size":268435456,
>         "type":"pmem",
>         "interleave_ways":1,
>         "interleave_granularity":1024,
>         "decode_state":"commit"
>       }
>     ]
>   }
> ]
> 
> 3. Format the device with ext4 file system in dax mode
> 
> 4. Write data to mounted device with dd
> 
> ubuntu@ubuntu:~$ time sudo dd if=/dev/urandom
> of=/home/ubuntu/mnt/pmem0/test bs=1M count=128
> 128+0 records in
> 128+0 records out
> 134217728 bytes (134 MB, 128 MiB) copied, 244.802 s, 548 kB/s
> 
> real    4m4.850s
> user    0m0.014s
> sys     0m0.013s
> 
> 
> Thanks & Regards,
> Lokesh
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Performance Issue with CXL-emulation
  2023-10-16  9:55 ` Performance Issue with CXL-emulation Jonathan Cameron via
@ 2023-10-16  9:55   ` Jonathan Cameron
  2023-10-16 22:26   ` lokesh jaliminche
  2023-10-16 22:37   ` lokesh jaliminche
  2 siblings, 0 replies; 4+ messages in thread
From: Jonathan Cameron @ 2023-10-16  9:55 UTC (permalink / raw)
  To: lokesh jaliminche; +Cc: qemu-devel-request, linux-cxl, qemu-devel

On Sun, 15 Oct 2023 10:39:46 -0700
lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:

> Hi Everyone,
> 
> I am facing performance issues while copying data to the CXL device
> (Emulated with QEMU). I get approximately 500KB/Sec. Any suggestion on how
> to improve this?

Hi Lokesh,

The target so far of QEMU emulation of CXL devices has been on functionality.
I'm in favour of work to improve on that, but it isn't likely to be my focus
- can offer some pointers on where to look though!

The fundamental problem (probably) is address decoding in CXL for interleaving
is at a sub page granularity. That means we can't use page table to perform the address
look ups in hardware. Note this also has the side effect that kvm won't work if
there is any chance that you will run instructions out of the CXL memory - it's
fine if you are interested in data only (DAX etc). (I've had a note in my todo list
to add a warning message about the KVM limitations for a while).

There have been a few discussions (mostly when we were debugging some TCG issues
and considering KVM support) about how we 'might' be able to improve this.  That focused
on a general 'fix', but there may be some lower hanging fruit.

The options I think might work are:

1) Special case configurations where there is no interleave going on.
   I'm not entirely sure how this would fit together and it won't deal with the
   more interesting cases - if it does work I'd want it to be minimally invasive because
   those complex cases are the main focus of testing etc.  There is an extension of this
   where we handle interleave, but only if it is 4k or above (on appropriately configured
   host).

2) Add caching layer to the CXL fixed memory windows.  That would hold copies of a
   number of pages that have been accessed in a software cache and setup the mappings for
   the hardware page table walkers to find them. If the page isn't cached we'd trigger
   a pagefault and have to bring it into the cache. If the configuration of the interleave
   is touched, all caches would need to be written back etc. This would need to be optional
   because I don't want to have to add cache coherency protocols etc when we add shared
   memory support (fun though it would be ;) 

3) Might be worth looking at the critical paths for lookups in your configuration.
   Maybe we can optimize the address decoders (basically a software TLB for HPA to DPA).
   I've not looked at the performance of those paths.  For your example the lookup is
   * CFMWS - nothing to do
   * Host bridge - nothing to do beyond a sanity check on range I think.
   * Nothing to to do.
   * Type 3 device - basic range match.
   So I'm not sure it is worth while - but you could do a really simple test by detecting
   no interleave is going on and caching the offset needed to go HPA to DPA + a device reference
   for the first time cxl_cfmws_find_device() is called. 
   https://elixir.bootlin.com/qemu/latest/source/hw/cxl/cxl-host.c#L129

   Then just match on hwaddr on another call of cxl_cmws_find_device() and return the device
   directly.  Maybe also shortcut lookups in cxl_type3_hpa_to_as_and_dpa() which does the endpoint
   decoding part. A quick hack would let you know if it was worth looking at something more general.

   Gut feeling is this last approach might get you some perf uptick but not going to solve
   the fundamental problem that in general we can't do the translation in hardware (unlike most
   other memory accesses in QEMU).

   Not I believe all writes to file backed memory will go all the way to the file. So you might want
   to try backing it with RAM but I as with the above, that's not going to address the fundamental
   problem.

Jonathan

> 
> Steps to reproduce :
> ===============
> 1. QEMU Command:
> sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
> -hda ./images/ubuntu-22.04-server-cloudimg-amd64.img \
> -hdb ./images/user-data.img \
> -M q35,cxl=on,accel=kvm,nvdimm=on \
> -smp 16 \
> -m 16G,maxmem=32G,slots=8 \
> -object
> memory-backend-file,id=cxl-mem1,share=on,mem-path=/mnt/qemu_files/cxltest.raw,size=256M
> \
> -object
> memory-backend-file,id=cxl-lsa1,share=on,mem-path=/mnt/qemu_files/lsa.raw,size=256M
> \
> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> -device
> cxl-type3,bus=root_port13,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0
> \
> -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G \
> -nographic \
> 
> 2. Configure device with fsdax mode
> ubuntu@ubuntu:~$ cxl list
> [
>   {
>     "memdevs":[
>       {
>         "memdev":"mem0",
>         "pmem_size":268435456,
>         "serial":0,
>         "host":"0000:0d:00.0"
>       }
>     ]
>   },
>   {
>     "regions":[
>       {
>         "region":"region0",
>         "resource":45365592064,
>         "size":268435456,
>         "type":"pmem",
>         "interleave_ways":1,
>         "interleave_granularity":1024,
>         "decode_state":"commit"
>       }
>     ]
>   }
> ]
> 
> 3. Format the device with ext4 file system in dax mode
> 
> 4. Write data to mounted device with dd
> 
> ubuntu@ubuntu:~$ time sudo dd if=/dev/urandom
> of=/home/ubuntu/mnt/pmem0/test bs=1M count=128
> 128+0 records in
> 128+0 records out
> 134217728 bytes (134 MB, 128 MiB) copied, 244.802 s, 548 kB/s
> 
> real    4m4.850s
> user    0m0.014s
> sys     0m0.013s
> 
> 
> Thanks & Regards,
> Lokesh
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Performance Issue with CXL-emulation
  2023-10-16  9:55 ` Performance Issue with CXL-emulation Jonathan Cameron via
  2023-10-16  9:55   ` Jonathan Cameron
@ 2023-10-16 22:26   ` lokesh jaliminche
  2023-10-16 22:37   ` lokesh jaliminche
  2 siblings, 0 replies; 4+ messages in thread
From: lokesh jaliminche @ 2023-10-16 22:26 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: qemu-devel-request, linux-cxl, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 6209 bytes --]

Hi Jonathan,

Thanks for your quick and detailed response. I'll explore these options
further and asses if I get any performance uptick.

Thanks & Regards,
Lokesh

On Mon, Oct 16, 2023 at 2:56 AM Jonathan Cameron <
Jonathan.Cameron@huawei.com> wrote:

> On Sun, 15 Oct 2023 10:39:46 -0700
> lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
>
> > Hi Everyone,
> >
> > I am facing performance issues while copying data to the CXL device
> > (Emulated with QEMU). I get approximately 500KB/Sec. Any suggestion on
> how
> > to improve this?
>
> Hi Lokesh,
>
> The target so far of QEMU emulation of CXL devices has been on
> functionality.
> I'm in favour of work to improve on that, but it isn't likely to be my
> focus
> - can offer some pointers on where to look though!
>
> The fundamental problem (probably) is address decoding in CXL for
> interleaving
> is at a sub page granularity. That means we can't use page table to
> perform the address
> look ups in hardware. Note this also has the side effect that kvm won't
> work if
> there is any chance that you will run instructions out of the CXL memory -
> it's
> fine if you are interested in data only (DAX etc). (I've had a note in my
> todo list
> to add a warning message about the KVM limitations for a while).
>
> There have been a few discussions (mostly when we were debugging some TCG
> issues
> and considering KVM support) about how we 'might' be able to improve
> this.  That focused
> on a general 'fix', but there may be some lower hanging fruit.
>
> The options I think might work are:
>
> 1) Special case configurations where there is no interleave going on.
>    I'm not entirely sure how this would fit together and it won't deal
> with the
>    more interesting cases - if it does work I'd want it to be minimally
> invasive because
>    those complex cases are the main focus of testing etc.  There is an
> extension of this
>    where we handle interleave, but only if it is 4k or above (on
> appropriately configured
>    host).
>
> 2) Add caching layer to the CXL fixed memory windows.  That would hold
> copies of a
>    number of pages that have been accessed in a software cache and setup
> the mappings for
>    the hardware page table walkers to find them. If the page isn't cached
> we'd trigger
>    a pagefault and have to bring it into the cache. If the configuration
> of the interleave
>    is touched, all caches would need to be written back etc. This would
> need to be optional
>    because I don't want to have to add cache coherency protocols etc when
> we add shared
>    memory support (fun though it would be ;)
>
> 3) Might be worth looking at the critical paths for lookups in your
> configuration.
>    Maybe we can optimize the address decoders (basically a software TLB
> for HPA to DPA).
>    I've not looked at the performance of those paths.  For your example
> the lookup is
>    * CFMWS - nothing to do
>    * Host bridge - nothing to do beyond a sanity check on range I think.
>    * Nothing to to do.
>    * Type 3 device - basic range match.
>    So I'm not sure it is worth while - but you could do a really simple
> test by detecting
>    no interleave is going on and caching the offset needed to go HPA to
> DPA + a device reference
>    for the first time cxl_cfmws_find_device() is called.
>    https://elixir.bootlin.com/qemu/latest/source/hw/cxl/cxl-host.c#L129
>
>
>    Then just match on hwaddr on another call of cxl_cmws_find_device() and
> return the device
>    directly.  Maybe also shortcut lookups in cxl_type3_hpa_to_as_and_dpa()
> which does the endpoint
>    decoding part. A quick hack would let you know if it was worth looking
> at something more general.
>
>    Gut feeling is this last approach might get you some perf uptick but
> not going to solve
>    the fundamental problem that in general we can't do the translation in
> hardware (unlike most
>    other memory accesses in QEMU).
>
>    Not I believe all writes to file backed memory will go all the way to
> the file. So you might want
>    to try backing it with RAM but I as with the above, that's not going to
> address the fundamental
>    problem.
>
>
> Jonathan
>
>
>
>
> >
> > Steps to reproduce :
> > ===============
> > 1. QEMU Command:
> > sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
> > -hda ./images/ubuntu-22.04-server-cloudimg-amd64.img \
> > -hdb ./images/user-data.img \
> > -M q35,cxl=on,accel=kvm,nvdimm=on \
> > -smp 16 \
> > -m 16G,maxmem=32G,slots=8 \
> > -object
> >
> memory-backend-file,id=cxl-mem1,share=on,mem-path=/mnt/qemu_files/cxltest.raw,size=256M
> > \
> > -object
> >
> memory-backend-file,id=cxl-lsa1,share=on,mem-path=/mnt/qemu_files/lsa.raw,size=256M
> > \
> > -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> > -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> > -device
> >
> cxl-type3,bus=root_port13,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0
> > \
> > -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G \
> > -nographic \
> >
> > 2. Configure device with fsdax mode
> > ubuntu@ubuntu:~$ cxl list
> > [
> >   {
> >     "memdevs":[
> >       {
> >         "memdev":"mem0",
> >         "pmem_size":268435456,
> >         "serial":0,
> >         "host":"0000:0d:00.0"
> >       }
> >     ]
> >   },
> >   {
> >     "regions":[
> >       {
> >         "region":"region0",
> >         "resource":45365592064,
> >         "size":268435456,
> >         "type":"pmem",
> >         "interleave_ways":1,
> >         "interleave_granularity":1024,
> >         "decode_state":"commit"
> >       }
> >     ]
> >   }
> > ]
> >
> > 3. Format the device with ext4 file system in dax mode
> >
> > 4. Write data to mounted device with dd
> >
> > ubuntu@ubuntu:~$ time sudo dd if=/dev/urandom
> > of=/home/ubuntu/mnt/pmem0/test bs=1M count=128
> > 128+0 records in
> > 128+0 records out
> > 134217728 bytes (134 MB, 128 MiB) copied, 244.802 s, 548 kB/s
> >
> > real    4m4.850s
> > user    0m0.014s
> > sys     0m0.013s
> >
> >
> > Thanks & Regards,
> > Lokesh
> >
>
>

[-- Attachment #2: Type: text/html, Size: 7597 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Performance Issue with CXL-emulation
  2023-10-16  9:55 ` Performance Issue with CXL-emulation Jonathan Cameron via
  2023-10-16  9:55   ` Jonathan Cameron
  2023-10-16 22:26   ` lokesh jaliminche
@ 2023-10-16 22:37   ` lokesh jaliminche
  2 siblings, 0 replies; 4+ messages in thread
From: lokesh jaliminche @ 2023-10-16 22:37 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: qemu-devel-request, linux-cxl, qemu-devel

Hi Jonathan,

Thanks for your quick and detailed response. I'll explore these
options further and asses if I get any performance uptick.

Thanks & Regards,
Lokesh


On Mon, Oct 16, 2023 at 2:56 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Sun, 15 Oct 2023 10:39:46 -0700
> lokesh jaliminche <lokesh.jaliminche@gmail.com> wrote:
>
> > Hi Everyone,
> >
> > I am facing performance issues while copying data to the CXL device
> > (Emulated with QEMU). I get approximately 500KB/Sec. Any suggestion on how
> > to improve this?
>
> Hi Lokesh,
>
> The target so far of QEMU emulation of CXL devices has been on functionality.
> I'm in favour of work to improve on that, but it isn't likely to be my focus
> - can offer some pointers on where to look though!
>
> The fundamental problem (probably) is address decoding in CXL for interleaving
> is at a sub page granularity. That means we can't use page table to perform the address
> look ups in hardware. Note this also has the side effect that kvm won't work if
> there is any chance that you will run instructions out of the CXL memory - it's
> fine if you are interested in data only (DAX etc). (I've had a note in my todo list
> to add a warning message about the KVM limitations for a while).
>
> There have been a few discussions (mostly when we were debugging some TCG issues
> and considering KVM support) about how we 'might' be able to improve this.  That focused
> on a general 'fix', but there may be some lower hanging fruit.
>
> The options I think might work are:
>
> 1) Special case configurations where there is no interleave going on.
>    I'm not entirely sure how this would fit together and it won't deal with the
>    more interesting cases - if it does work I'd want it to be minimally invasive because
>    those complex cases are the main focus of testing etc.  There is an extension of this
>    where we handle interleave, but only if it is 4k or above (on appropriately configured
>    host).
>
> 2) Add caching layer to the CXL fixed memory windows.  That would hold copies of a
>    number of pages that have been accessed in a software cache and setup the mappings for
>    the hardware page table walkers to find them. If the page isn't cached we'd trigger
>    a pagefault and have to bring it into the cache. If the configuration of the interleave
>    is touched, all caches would need to be written back etc. This would need to be optional
>    because I don't want to have to add cache coherency protocols etc when we add shared
>    memory support (fun though it would be ;)
>
> 3) Might be worth looking at the critical paths for lookups in your configuration.
>    Maybe we can optimize the address decoders (basically a software TLB for HPA to DPA).
>    I've not looked at the performance of those paths.  For your example the lookup is
>    * CFMWS - nothing to do
>    * Host bridge - nothing to do beyond a sanity check on range I think.
>    * Nothing to to do.
>    * Type 3 device - basic range match.
>    So I'm not sure it is worth while - but you could do a really simple test by detecting
>    no interleave is going on and caching the offset needed to go HPA to DPA + a device reference
>    for the first time cxl_cfmws_find_device() is called.
>    https://elixir.bootlin.com/qemu/latest/source/hw/cxl/cxl-host.c#L129
>
>
>    Then just match on hwaddr on another call of cxl_cmws_find_device() and return the device
>    directly.  Maybe also shortcut lookups in cxl_type3_hpa_to_as_and_dpa() which does the endpoint
>    decoding part. A quick hack would let you know if it was worth looking at something more general.
>
>    Gut feeling is this last approach might get you some perf uptick but not going to solve
>    the fundamental problem that in general we can't do the translation in hardware (unlike most
>    other memory accesses in QEMU).
>
>    Not I believe all writes to file backed memory will go all the way to the file. So you might want
>    to try backing it with RAM but I as with the above, that's not going to address the fundamental
>    problem.
>
>
> Jonathan
>
>
>
>
> >
> > Steps to reproduce :
> > ===============
> > 1. QEMU Command:
> > sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
> > -hda ./images/ubuntu-22.04-server-cloudimg-amd64.img \
> > -hdb ./images/user-data.img \
> > -M q35,cxl=on,accel=kvm,nvdimm=on \
> > -smp 16 \
> > -m 16G,maxmem=32G,slots=8 \
> > -object
> > memory-backend-file,id=cxl-mem1,share=on,mem-path=/mnt/qemu_files/cxltest.raw,size=256M
> > \
> > -object
> > memory-backend-file,id=cxl-lsa1,share=on,mem-path=/mnt/qemu_files/lsa.raw,size=256M
> > \
> > -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> > -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> > -device
> > cxl-type3,bus=root_port13,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0
> > \
> > -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G \
> > -nographic \
> >
> > 2. Configure device with fsdax mode
> > ubuntu@ubuntu:~$ cxl list
> > [
> >   {
> >     "memdevs":[
> >       {
> >         "memdev":"mem0",
> >         "pmem_size":268435456,
> >         "serial":0,
> >         "host":"0000:0d:00.0"
> >       }
> >     ]
> >   },
> >   {
> >     "regions":[
> >       {
> >         "region":"region0",
> >         "resource":45365592064,
> >         "size":268435456,
> >         "type":"pmem",
> >         "interleave_ways":1,
> >         "interleave_granularity":1024,
> >         "decode_state":"commit"
> >       }
> >     ]
> >   }
> > ]
> >
> > 3. Format the device with ext4 file system in dax mode
> >
> > 4. Write data to mounted device with dd
> >
> > ubuntu@ubuntu:~$ time sudo dd if=/dev/urandom
> > of=/home/ubuntu/mnt/pmem0/test bs=1M count=128
> > 128+0 records in
> > 128+0 records out
> > 134217728 bytes (134 MB, 128 MiB) copied, 244.802 s, 548 kB/s
> >
> > real    4m4.850s
> > user    0m0.014s
> > sys     0m0.013s
> >
> >
> > Thanks & Regards,
> > Lokesh
> >
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-10-16 22:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAKJOkCoxLG01Dt7xMjOPWRqhyLPuaNGRUaDn-sgAFfhERtAYJA@mail.gmail.com>
2023-10-16  9:55 ` Performance Issue with CXL-emulation Jonathan Cameron via
2023-10-16  9:55   ` Jonathan Cameron
2023-10-16 22:26   ` lokesh jaliminche
2023-10-16 22:37   ` lokesh jaliminche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).