qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
@ 2025-11-19 17:28 konrad.schwarz
  2025-11-21 10:16 ` Jan Kiszka
  2025-11-21 12:25 ` Paolo Bonzini
  0 siblings, 2 replies; 8+ messages in thread
From: konrad.schwarz @ 2025-11-19 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: philmd, Konrad Schwarz, jan.kiszka

The mkemmc.sh script calculates file sizes via `wc -c'.  `wc'
normally works by reading the entire file, resulting in O(n) performance.

Unix file systems obviously know a file's size and POSIX `ls' reports this
information unambiguously, so replacing `wc' with `ls' ensures O(1)
performance.  The files in question tend to be large making this change
worthwhile.

Signed-off-by: Konrad Schwarz <konrad.schwarz@siemens.com>
---
 scripts/mkemmc.sh | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh
index 45dc3f08fa..d2c4e84b16 100755
--- a/scripts/mkemmc.sh
+++ b/scripts/mkemmc.sh
@@ -37,13 +37,19 @@ usage() {
     exit "$1"
 }

+file_size() {
+	ls_line=$(ls -Hdog "$1") || return
+	printf %s\\n "$ls_line" | cut -d\  -f3
+	unset ls_line
+}
+
 process_size() {
     name=$1
     image_file=$2
     alignment=$3
     image_arg=$4
     if [ "${image_arg#*:}" = "$image_arg"  ]; then
-        if ! size=$(wc -c < "$image_file" 2>/dev/null); then
+        if ! size=$(file_size "$image_file"); then
             echo "Missing $name image '$image_file'." >&2
             exit 1
         fi
@@ -105,7 +111,7 @@ check_truncation() {
     if [ "$image_file" = "/dev/zero" ]; then
         return
     fi
-    if ! actual_size=$(wc -c < "$image_file" 2>/dev/null); then
+    if ! actual_size=$(file_size "$image_file"); then
         echo "Missing image '$image_file'." >&2
         exit 1
     fi
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
  2025-11-19 17:28 [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) konrad.schwarz
@ 2025-11-21 10:16 ` Jan Kiszka
  2025-11-21 11:09   ` Daniel P. Berrangé
  2025-11-21 13:32   ` Schwarz, Konrad
  2025-11-21 12:25 ` Paolo Bonzini
  1 sibling, 2 replies; 8+ messages in thread
From: Jan Kiszka @ 2025-11-21 10:16 UTC (permalink / raw)
  To: konrad.schwarz, qemu-devel; +Cc: philmd, Konrad Schwarz

You want to add

From: Konrad Schwarz <konrad.schwarz@siemens.com>

to the top of the commit message when submitting via a different account
so that author and signer are aligned. Or use git sendemail which will
do that when author and submitter address differ.

On 19.11.25 18:28, konrad.schwarz@gmail.com wrote:
> The mkemmc.sh script calculates file sizes via `wc -c'.  `wc'
> normally works by reading the entire file, resulting in O(n) performance.
> 
> Unix file systems obviously know a file's size and POSIX `ls' reports this
> information unambiguously, so replacing `wc' with `ls' ensures O(1)
> performance.  The files in question tend to be large making this change
> worthwhile.

Valid point.

> 
> Signed-off-by: Konrad Schwarz <konrad.schwarz@siemens.com>
> ---
>  scripts/mkemmc.sh | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh
> index 45dc3f08fa..d2c4e84b16 100755
> --- a/scripts/mkemmc.sh
> +++ b/scripts/mkemmc.sh
> @@ -37,13 +37,19 @@ usage() {
>      exit "$1"
>  }
> 
> +file_size() {
> +	ls_line=$(ls -Hdog "$1") || return

This will not suppress the error message when a file does not exist or
is not accessible, so:

ls_line=$(ls -Hdog "$1" 2>/dev/null) || return

> +	printf %s\\n "$ls_line" | cut -d\  -f3
> +	unset ls_line
> +}
> +
>  process_size() {
>      name=$1
>      image_file=$2
>      alignment=$3
>      image_arg=$4
>      if [ "${image_arg#*:}" = "$image_arg"  ]; then
> -        if ! size=$(wc -c < "$image_file" 2>/dev/null); then
> +        if ! size=$(file_size "$image_file"); then
>              echo "Missing $name image '$image_file'." >&2
>              exit 1
>          fi
> @@ -105,7 +111,7 @@ check_truncation() {
>      if [ "$image_file" = "/dev/zero" ]; then
>          return
>      fi
> -    if ! actual_size=$(wc -c < "$image_file" 2>/dev/null); then
> +    if ! actual_size=$(file_size "$image_file"); then
>          echo "Missing image '$image_file'." >&2
>          exit 1
>      fi

Thanks,
Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
  2025-11-21 10:16 ` Jan Kiszka
@ 2025-11-21 11:09   ` Daniel P. Berrangé
  2025-11-21 14:07     ` Schwarz, Konrad
  2025-11-21 13:32   ` Schwarz, Konrad
  1 sibling, 1 reply; 8+ messages in thread
From: Daniel P. Berrangé @ 2025-11-21 11:09 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: konrad.schwarz, qemu-devel, philmd, Konrad Schwarz

On Fri, Nov 21, 2025 at 11:16:11AM +0100, Jan Kiszka wrote:
> You want to add
> 
> From: Konrad Schwarz <konrad.schwarz@siemens.com>
> 
> to the top of the commit message when submitting via a different account
> so that author and signer are aligned. Or use git sendemail which will
> do that when author and submitter address differ.
> 
> On 19.11.25 18:28, konrad.schwarz@gmail.com wrote:
> > The mkemmc.sh script calculates file sizes via `wc -c'.  `wc'
> > normally works by reading the entire file, resulting in O(n) performance.
> > 
> > Unix file systems obviously know a file's size and POSIX `ls' reports this
> > information unambiguously, so replacing `wc' with `ls' ensures O(1)
> > performance.  The files in question tend to be large making this change
> > worthwhile.
> 
> Valid point.
> 
> > 
> > Signed-off-by: Konrad Schwarz <konrad.schwarz@siemens.com>
> > ---
> >  scripts/mkemmc.sh | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh
> > index 45dc3f08fa..d2c4e84b16 100755
> > --- a/scripts/mkemmc.sh
> > +++ b/scripts/mkemmc.sh
> > @@ -37,13 +37,19 @@ usage() {
> >      exit "$1"
> >  }
> > 
> > +file_size() {
> > +	ls_line=$(ls -Hdog "$1") || return
> 
> This will not suppress the error message when a file does not exist or
> is not accessible, so:
> 
> ls_line=$(ls -Hdog "$1" 2>/dev/null) || return
> 
> > +	printf %s\\n "$ls_line" | cut -d\  -f3
> > +	unset ls_line

This parsing of 'ls' output could be simplified by
using the 'stat' command with a format string to
request only the file size.

    stat --format=%s "$1"


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
  2025-11-19 17:28 [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) konrad.schwarz
  2025-11-21 10:16 ` Jan Kiszka
@ 2025-11-21 12:25 ` Paolo Bonzini
       [not found]   ` <AS8PR10MB4680FED05DB882C0C7751E058AD5A@AS8PR10MB4680.EURPRD10.PROD.OUTLOOK.COM>
  1 sibling, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2025-11-21 12:25 UTC (permalink / raw)
  To: konrad.schwarz, qemu-devel; +Cc: philmd, Konrad Schwarz, jan.kiszka

On 11/19/25 18:28, konrad.schwarz@gmail.com wrote:
> The mkemmc.sh script calculates file sizes via `wc -c'.  `wc'
> normally works by reading the entire file, resulting in O(n) performance.

Even something as mundane as 'wc' can surprise you!  Running "strace wc 
-c < somefile.txt" shows this:

fstat(0, {st_mode=S_IFREG|0644, st_size=6900, ...}) = 0
lseek(0, 0, SEEK_CUR)                   = 0
lseek(0, 6900, SEEK_CUR)                = 6900

So wc notices you don't need word or line counts, and takes a shortcut.

Paolo

> Unix file systems obviously know a file's size and POSIX `ls' reports this
> information unambiguously, so replacing `wc' with `ls' ensures O(1)
> performance.  The files in question tend to be large making this change
> worthwhile.
> 
> Signed-off-by: Konrad Schwarz <konrad.schwarz@siemens.com>
> ---
>   scripts/mkemmc.sh | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh
> index 45dc3f08fa..d2c4e84b16 100755
> --- a/scripts/mkemmc.sh
> +++ b/scripts/mkemmc.sh
> @@ -37,13 +37,19 @@ usage() {
>       exit "$1"
>   }
> 
> +file_size() {
> +	ls_line=$(ls -Hdog "$1") || return
> +	printf %s\\n "$ls_line" | cut -d\  -f3
> +	unset ls_line
> +}
> +
>   process_size() {
>       name=$1
>       image_file=$2
>       alignment=$3
>       image_arg=$4
>       if [ "${image_arg#*:}" = "$image_arg"  ]; then
> -        if ! size=$(wc -c < "$image_file" 2>/dev/null); then
> +        if ! size=$(file_size "$image_file"); then
>               echo "Missing $name image '$image_file'." >&2
>               exit 1
>           fi
> @@ -105,7 +111,7 @@ check_truncation() {
>       if [ "$image_file" = "/dev/zero" ]; then
>           return
>       fi
> -    if ! actual_size=$(wc -c < "$image_file" 2>/dev/null); then
> +    if ! actual_size=$(file_size "$image_file"); then
>           echo "Missing image '$image_file'." >&2
>           exit 1
>       fi



^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
  2025-11-21 10:16 ` Jan Kiszka
  2025-11-21 11:09   ` Daniel P. Berrangé
@ 2025-11-21 13:32   ` Schwarz, Konrad
  2025-11-21 14:10     ` Jan Kiszka
  1 sibling, 1 reply; 8+ messages in thread
From: Schwarz, Konrad @ 2025-11-21 13:32 UTC (permalink / raw)
  To: Kiszka, Jan, konrad.schwarz@gmail.com, qemu-devel@nongnu.org
  Cc: philmd@linaro.org

> From: Kiszka, Jan (FT RPD CED) <jan.kiszka@siemens.com>
> Sent: Friday, November 21, 2025 11:16
> Subject: Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to
> O(1)
>  
> >  scripts/mkemmc.sh | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh index
> > 45dc3f08fa..d2c4e84b16 100755
> > --- a/scripts/mkemmc.sh
> > +++ b/scripts/mkemmc.sh
> > @@ -37,13 +37,19 @@ usage() {
> >      exit "$1"
> >  }
> >
> > +file_size() {
> > +	ls_line=$(ls -Hdog "$1") || return
> 
> This will not suppress the error message when a file does not exist or is not
> accessible, so:
> 
> ls_line=$(ls -Hdog "$1" 2>/dev/null) || return

My take on this is:

`ls' is able to produce informative diagnostic messages as it has
access to `errno'. The additional information, e.g., whether an 'EACCES',
an `ENOENT' or an `ENOTDIR' error has occurred, should in the majority
of cases help the user in fixing the underlying problem. I think it would be
counter-productive to suppress this.

(In fact, one could go further and consider using only the error message
of `ls' and not provide an own error message at all.
In this case, it would be especially easy to return `ls' status back to 
the script's invoker, by simply invoking `exit' without an argument.)

> 
> > +	printf %s\\n "$ls_line" | cut -d\  -f3
> > +	unset ls_line
> > +}
> > +
> >  process_size() {
> >      name=$1
> >      image_file=$2
> >      alignment=$3
> >      image_arg=$4
> >      if [ "${image_arg#*:}" = "$image_arg"  ]; then
> > -        if ! size=$(wc -c < "$image_file" 2>/dev/null); then
> > +        if ! size=$(file_size "$image_file"); then
> >              echo "Missing $name image '$image_file'." >&2
> >              exit 1
> >          fi
> > @@ -105,7 +111,7 @@ check_truncation() {
> >      if [ "$image_file" = "/dev/zero" ]; then
> >          return
> >      fi
> > -    if ! actual_size=$(wc -c < "$image_file" 2>/dev/null); then
> > +    if ! actual_size=$(file_size "$image_file"); then
> >          echo "Missing image '$image_file'." >&2
> >          exit 1
> >      fi
> 
> Thanks,
> Jan
> 
> --
> Siemens AG, Foundational Technologies
> Linux Expert Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
  2025-11-21 11:09   ` Daniel P. Berrangé
@ 2025-11-21 14:07     ` Schwarz, Konrad
  0 siblings, 0 replies; 8+ messages in thread
From: Schwarz, Konrad @ 2025-11-21 14:07 UTC (permalink / raw)
  To: Daniel P. Berrangé, Kiszka, Jan
  Cc: konrad.schwarz@gmail.com, qemu-devel@nongnu.org,
	philmd@linaro.org

> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Friday, November 21, 2025 12:09
> Subject: Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to
> O(1)

> > ls_line=$(ls -Hdog "$1" 2>/dev/null) || return
> >
> > > +   printf %s\\n "$ls_line" | cut -d\  -f3
> > > +   unset ls_line
> 
> This parsing of 'ls' output could be simplified by using the 'stat' command
> with a format string to request only the file size.
> 
>     stat --format=%s "$1"

The use of `wc' results from the fact that `stat' is not available on all platforms,
iMacOS in particular, to my knowledge.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
  2025-11-21 13:32   ` Schwarz, Konrad
@ 2025-11-21 14:10     ` Jan Kiszka
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Kiszka @ 2025-11-21 14:10 UTC (permalink / raw)
  To: Schwarz, Konrad (FT RPD CED OES-DE), konrad.schwarz@gmail.com,
	qemu-devel@nongnu.org
  Cc: philmd@linaro.org

On 21.11.25 14:32, Schwarz, Konrad (FT RPD CED OES-DE) wrote:
>> From: Kiszka, Jan (FT RPD CED) <jan.kiszka@siemens.com>
>> Sent: Friday, November 21, 2025 11:16
>> Subject: Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to
>> O(1)
>>  
>>>  scripts/mkemmc.sh | 10 ++++++++--
>>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh index
>>> 45dc3f08fa..d2c4e84b16 100755
>>> --- a/scripts/mkemmc.sh
>>> +++ b/scripts/mkemmc.sh
>>> @@ -37,13 +37,19 @@ usage() {
>>>      exit "$1"
>>>  }
>>>
>>> +file_size() {
>>> +	ls_line=$(ls -Hdog "$1") || return
>>
>> This will not suppress the error message when a file does not exist or is not
>> accessible, so:
>>
>> ls_line=$(ls -Hdog "$1" 2>/dev/null) || return
> 
> My take on this is:
> 
> `ls' is able to produce informative diagnostic messages as it has
> access to `errno'. The additional information, e.g., whether an 'EACCES',
> an `ENOENT' or an `ENOTDIR' error has occurred, should in the majority
> of cases help the user in fixing the underlying problem. I think it would be
> counter-productive to suppress this.

Even if that was true, you should not fold this change into your O(1)
optimization and argue about that separately.

> 
> (In fact, one could go further and consider using only the error message
> of `ls' and not provide an own error message at all.

I intentionally wanted to handle the error outside of the low-level
function here. The caller should not care how the size is retrieved -
implementation detail.

But the ls approach still has a major issue: It gives a size for a
directory, rather than failing with "not a readable file".

Jan

> In this case, it would be especially easy to return `ls' status back to 
> the script's invoker, by simply invoking `exit' without an argument.)
> 
>>
>>> +	printf %s\\n "$ls_line" | cut -d\  -f3
>>> +	unset ls_line
>>> +}
>>> +
>>>  process_size() {
>>>      name=$1
>>>      image_file=$2
>>>      alignment=$3
>>>      image_arg=$4
>>>      if [ "${image_arg#*:}" = "$image_arg"  ]; then
>>> -        if ! size=$(wc -c < "$image_file" 2>/dev/null); then
>>> +        if ! size=$(file_size "$image_file"); then
>>>              echo "Missing $name image '$image_file'." >&2
>>>              exit 1
>>>          fi
>>> @@ -105,7 +111,7 @@ check_truncation() {
>>>      if [ "$image_file" = "/dev/zero" ]; then
>>>          return
>>>      fi
>>> -    if ! actual_size=$(wc -c < "$image_file" 2>/dev/null); then
>>> +    if ! actual_size=$(file_size "$image_file"); then
>>>          echo "Missing image '$image_file'." >&2
>>>          exit 1
>>>      fi
>>
>> Thanks,
>> Jan
>>
>> --
>> Siemens AG, Foundational Technologies
>> Linux Expert Center

-- 
Siemens AG, Foundational Technologies
Linux Expert Center


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
       [not found]     ` <CABgObfb89UUyha3xz5d2MihRkg2WKBTpUOiNMA5o8oPE=MBMyQ@mail.gmail.com>
@ 2025-11-21 14:57       ` Schwarz, Konrad
  0 siblings, 0 replies; 8+ messages in thread
From: Schwarz, Konrad @ 2025-11-21 14:57 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: philmd@linaro.org, Kiszka, Jan, qemu-devel@nongnu.org

> -----Original Message-----
> From: Paolo Bonzini <pbonzini@redhat.com>
> Sent: Friday, November 21, 2025 15:41
> > The script is designed to be cross-platform
> 
> Sure but it's not broken, just less effective, and the optimization is actually
> quite common. I have just checked and it's in all of GNU coreutils, FreeBSD,
> OpenBSD, NetBSD. Rust uutils does not do it for stdin, though.
> 
> > There is no guarantee that this optimization will be present on all potential
> platforms.
> > OTOH, it is virtually guaranteed that `ls' will always `stat()' the
> > file.  It would seem prudent to avoid potential pitfalls.
> 
> Parsing the output of "ls" is also a potential pitfall. In particular, I am not sure
> what happens if $image_file has a newline.  Maybe this is better:
> 
>     ls -Hdog "$1" | {
>         read ls_ign1 ls_ign2 ls_size ls_rest
>         echo $ls_size
>     }

You are right, I forgot about this case.

A perhaps simpler solution is to use `ls -Hdogq', which replaces non-printing characters
(including newline) in the filename with a `?'.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-11-22  5:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-19 17:28 [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) konrad.schwarz
2025-11-21 10:16 ` Jan Kiszka
2025-11-21 11:09   ` Daniel P. Berrangé
2025-11-21 14:07     ` Schwarz, Konrad
2025-11-21 13:32   ` Schwarz, Konrad
2025-11-21 14:10     ` Jan Kiszka
2025-11-21 12:25 ` Paolo Bonzini
     [not found]   ` <AS8PR10MB4680FED05DB882C0C7751E058AD5A@AS8PR10MB4680.EURPRD10.PROD.OUTLOOK.COM>
     [not found]     ` <CABgObfb89UUyha3xz5d2MihRkg2WKBTpUOiNMA5o8oPE=MBMyQ@mail.gmail.com>
2025-11-21 14:57       ` Schwarz, Konrad

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).