* [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
@ 2025-11-19 17:28 konrad.schwarz
2025-11-21 10:16 ` Jan Kiszka
2025-11-21 12:25 ` Paolo Bonzini
0 siblings, 2 replies; 8+ messages in thread
From: konrad.schwarz @ 2025-11-19 17:28 UTC (permalink / raw)
To: qemu-devel; +Cc: philmd, Konrad Schwarz, jan.kiszka
The mkemmc.sh script calculates file sizes via `wc -c'. `wc'
normally works by reading the entire file, resulting in O(n) performance.
Unix file systems obviously know a file's size and POSIX `ls' reports this
information unambiguously, so replacing `wc' with `ls' ensures O(1)
performance. The files in question tend to be large making this change
worthwhile.
Signed-off-by: Konrad Schwarz <konrad.schwarz@siemens.com>
---
scripts/mkemmc.sh | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh
index 45dc3f08fa..d2c4e84b16 100755
--- a/scripts/mkemmc.sh
+++ b/scripts/mkemmc.sh
@@ -37,13 +37,19 @@ usage() {
exit "$1"
}
+file_size() {
+ ls_line=$(ls -Hdog "$1") || return
+ printf %s\\n "$ls_line" | cut -d\ -f3
+ unset ls_line
+}
+
process_size() {
name=$1
image_file=$2
alignment=$3
image_arg=$4
if [ "${image_arg#*:}" = "$image_arg" ]; then
- if ! size=$(wc -c < "$image_file" 2>/dev/null); then
+ if ! size=$(file_size "$image_file"); then
echo "Missing $name image '$image_file'." >&2
exit 1
fi
@@ -105,7 +111,7 @@ check_truncation() {
if [ "$image_file" = "/dev/zero" ]; then
return
fi
- if ! actual_size=$(wc -c < "$image_file" 2>/dev/null); then
+ if ! actual_size=$(file_size "$image_file"); then
echo "Missing image '$image_file'." >&2
exit 1
fi
--
2.39.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
2025-11-19 17:28 [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) konrad.schwarz
@ 2025-11-21 10:16 ` Jan Kiszka
2025-11-21 11:09 ` Daniel P. Berrangé
2025-11-21 13:32 ` Schwarz, Konrad
2025-11-21 12:25 ` Paolo Bonzini
1 sibling, 2 replies; 8+ messages in thread
From: Jan Kiszka @ 2025-11-21 10:16 UTC (permalink / raw)
To: konrad.schwarz, qemu-devel; +Cc: philmd, Konrad Schwarz
You want to add
From: Konrad Schwarz <konrad.schwarz@siemens.com>
to the top of the commit message when submitting via a different account
so that author and signer are aligned. Or use git sendemail which will
do that when author and submitter address differ.
On 19.11.25 18:28, konrad.schwarz@gmail.com wrote:
> The mkemmc.sh script calculates file sizes via `wc -c'. `wc'
> normally works by reading the entire file, resulting in O(n) performance.
>
> Unix file systems obviously know a file's size and POSIX `ls' reports this
> information unambiguously, so replacing `wc' with `ls' ensures O(1)
> performance. The files in question tend to be large making this change
> worthwhile.
Valid point.
>
> Signed-off-by: Konrad Schwarz <konrad.schwarz@siemens.com>
> ---
> scripts/mkemmc.sh | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh
> index 45dc3f08fa..d2c4e84b16 100755
> --- a/scripts/mkemmc.sh
> +++ b/scripts/mkemmc.sh
> @@ -37,13 +37,19 @@ usage() {
> exit "$1"
> }
>
> +file_size() {
> + ls_line=$(ls -Hdog "$1") || return
This will not suppress the error message when a file does not exist or
is not accessible, so:
ls_line=$(ls -Hdog "$1" 2>/dev/null) || return
> + printf %s\\n "$ls_line" | cut -d\ -f3
> + unset ls_line
> +}
> +
> process_size() {
> name=$1
> image_file=$2
> alignment=$3
> image_arg=$4
> if [ "${image_arg#*:}" = "$image_arg" ]; then
> - if ! size=$(wc -c < "$image_file" 2>/dev/null); then
> + if ! size=$(file_size "$image_file"); then
> echo "Missing $name image '$image_file'." >&2
> exit 1
> fi
> @@ -105,7 +111,7 @@ check_truncation() {
> if [ "$image_file" = "/dev/zero" ]; then
> return
> fi
> - if ! actual_size=$(wc -c < "$image_file" 2>/dev/null); then
> + if ! actual_size=$(file_size "$image_file"); then
> echo "Missing image '$image_file'." >&2
> exit 1
> fi
Thanks,
Jan
--
Siemens AG, Foundational Technologies
Linux Expert Center
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
2025-11-21 10:16 ` Jan Kiszka
@ 2025-11-21 11:09 ` Daniel P. Berrangé
2025-11-21 14:07 ` Schwarz, Konrad
2025-11-21 13:32 ` Schwarz, Konrad
1 sibling, 1 reply; 8+ messages in thread
From: Daniel P. Berrangé @ 2025-11-21 11:09 UTC (permalink / raw)
To: Jan Kiszka; +Cc: konrad.schwarz, qemu-devel, philmd, Konrad Schwarz
On Fri, Nov 21, 2025 at 11:16:11AM +0100, Jan Kiszka wrote:
> You want to add
>
> From: Konrad Schwarz <konrad.schwarz@siemens.com>
>
> to the top of the commit message when submitting via a different account
> so that author and signer are aligned. Or use git sendemail which will
> do that when author and submitter address differ.
>
> On 19.11.25 18:28, konrad.schwarz@gmail.com wrote:
> > The mkemmc.sh script calculates file sizes via `wc -c'. `wc'
> > normally works by reading the entire file, resulting in O(n) performance.
> >
> > Unix file systems obviously know a file's size and POSIX `ls' reports this
> > information unambiguously, so replacing `wc' with `ls' ensures O(1)
> > performance. The files in question tend to be large making this change
> > worthwhile.
>
> Valid point.
>
> >
> > Signed-off-by: Konrad Schwarz <konrad.schwarz@siemens.com>
> > ---
> > scripts/mkemmc.sh | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh
> > index 45dc3f08fa..d2c4e84b16 100755
> > --- a/scripts/mkemmc.sh
> > +++ b/scripts/mkemmc.sh
> > @@ -37,13 +37,19 @@ usage() {
> > exit "$1"
> > }
> >
> > +file_size() {
> > + ls_line=$(ls -Hdog "$1") || return
>
> This will not suppress the error message when a file does not exist or
> is not accessible, so:
>
> ls_line=$(ls -Hdog "$1" 2>/dev/null) || return
>
> > + printf %s\\n "$ls_line" | cut -d\ -f3
> > + unset ls_line
This parsing of 'ls' output could be simplified by
using the 'stat' command with a format string to
request only the file size.
stat --format=%s "$1"
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
2025-11-19 17:28 [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) konrad.schwarz
2025-11-21 10:16 ` Jan Kiszka
@ 2025-11-21 12:25 ` Paolo Bonzini
[not found] ` <AS8PR10MB4680FED05DB882C0C7751E058AD5A@AS8PR10MB4680.EURPRD10.PROD.OUTLOOK.COM>
1 sibling, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2025-11-21 12:25 UTC (permalink / raw)
To: konrad.schwarz, qemu-devel; +Cc: philmd, Konrad Schwarz, jan.kiszka
On 11/19/25 18:28, konrad.schwarz@gmail.com wrote:
> The mkemmc.sh script calculates file sizes via `wc -c'. `wc'
> normally works by reading the entire file, resulting in O(n) performance.
Even something as mundane as 'wc' can surprise you! Running "strace wc
-c < somefile.txt" shows this:
fstat(0, {st_mode=S_IFREG|0644, st_size=6900, ...}) = 0
lseek(0, 0, SEEK_CUR) = 0
lseek(0, 6900, SEEK_CUR) = 6900
So wc notices you don't need word or line counts, and takes a shortcut.
Paolo
> Unix file systems obviously know a file's size and POSIX `ls' reports this
> information unambiguously, so replacing `wc' with `ls' ensures O(1)
> performance. The files in question tend to be large making this change
> worthwhile.
>
> Signed-off-by: Konrad Schwarz <konrad.schwarz@siemens.com>
> ---
> scripts/mkemmc.sh | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh
> index 45dc3f08fa..d2c4e84b16 100755
> --- a/scripts/mkemmc.sh
> +++ b/scripts/mkemmc.sh
> @@ -37,13 +37,19 @@ usage() {
> exit "$1"
> }
>
> +file_size() {
> + ls_line=$(ls -Hdog "$1") || return
> + printf %s\\n "$ls_line" | cut -d\ -f3
> + unset ls_line
> +}
> +
> process_size() {
> name=$1
> image_file=$2
> alignment=$3
> image_arg=$4
> if [ "${image_arg#*:}" = "$image_arg" ]; then
> - if ! size=$(wc -c < "$image_file" 2>/dev/null); then
> + if ! size=$(file_size "$image_file"); then
> echo "Missing $name image '$image_file'." >&2
> exit 1
> fi
> @@ -105,7 +111,7 @@ check_truncation() {
> if [ "$image_file" = "/dev/zero" ]; then
> return
> fi
> - if ! actual_size=$(wc -c < "$image_file" 2>/dev/null); then
> + if ! actual_size=$(file_size "$image_file"); then
> echo "Missing image '$image_file'." >&2
> exit 1
> fi
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
2025-11-21 10:16 ` Jan Kiszka
2025-11-21 11:09 ` Daniel P. Berrangé
@ 2025-11-21 13:32 ` Schwarz, Konrad
2025-11-21 14:10 ` Jan Kiszka
1 sibling, 1 reply; 8+ messages in thread
From: Schwarz, Konrad @ 2025-11-21 13:32 UTC (permalink / raw)
To: Kiszka, Jan, konrad.schwarz@gmail.com, qemu-devel@nongnu.org
Cc: philmd@linaro.org
> From: Kiszka, Jan (FT RPD CED) <jan.kiszka@siemens.com>
> Sent: Friday, November 21, 2025 11:16
> Subject: Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to
> O(1)
>
> > scripts/mkemmc.sh | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh index
> > 45dc3f08fa..d2c4e84b16 100755
> > --- a/scripts/mkemmc.sh
> > +++ b/scripts/mkemmc.sh
> > @@ -37,13 +37,19 @@ usage() {
> > exit "$1"
> > }
> >
> > +file_size() {
> > + ls_line=$(ls -Hdog "$1") || return
>
> This will not suppress the error message when a file does not exist or is not
> accessible, so:
>
> ls_line=$(ls -Hdog "$1" 2>/dev/null) || return
My take on this is:
`ls' is able to produce informative diagnostic messages as it has
access to `errno'. The additional information, e.g., whether an 'EACCES',
an `ENOENT' or an `ENOTDIR' error has occurred, should in the majority
of cases help the user in fixing the underlying problem. I think it would be
counter-productive to suppress this.
(In fact, one could go further and consider using only the error message
of `ls' and not provide an own error message at all.
In this case, it would be especially easy to return `ls' status back to
the script's invoker, by simply invoking `exit' without an argument.)
>
> > + printf %s\\n "$ls_line" | cut -d\ -f3
> > + unset ls_line
> > +}
> > +
> > process_size() {
> > name=$1
> > image_file=$2
> > alignment=$3
> > image_arg=$4
> > if [ "${image_arg#*:}" = "$image_arg" ]; then
> > - if ! size=$(wc -c < "$image_file" 2>/dev/null); then
> > + if ! size=$(file_size "$image_file"); then
> > echo "Missing $name image '$image_file'." >&2
> > exit 1
> > fi
> > @@ -105,7 +111,7 @@ check_truncation() {
> > if [ "$image_file" = "/dev/zero" ]; then
> > return
> > fi
> > - if ! actual_size=$(wc -c < "$image_file" 2>/dev/null); then
> > + if ! actual_size=$(file_size "$image_file"); then
> > echo "Missing image '$image_file'." >&2
> > exit 1
> > fi
>
> Thanks,
> Jan
>
> --
> Siemens AG, Foundational Technologies
> Linux Expert Center
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
2025-11-21 11:09 ` Daniel P. Berrangé
@ 2025-11-21 14:07 ` Schwarz, Konrad
0 siblings, 0 replies; 8+ messages in thread
From: Schwarz, Konrad @ 2025-11-21 14:07 UTC (permalink / raw)
To: Daniel P. Berrangé, Kiszka, Jan
Cc: konrad.schwarz@gmail.com, qemu-devel@nongnu.org,
philmd@linaro.org
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Friday, November 21, 2025 12:09
> Subject: Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to
> O(1)
> > ls_line=$(ls -Hdog "$1" 2>/dev/null) || return
> >
> > > + printf %s\\n "$ls_line" | cut -d\ -f3
> > > + unset ls_line
>
> This parsing of 'ls' output could be simplified by using the 'stat' command
> with a format string to request only the file size.
>
> stat --format=%s "$1"
The use of `wc' results from the fact that `stat' is not available on all platforms,
iMacOS in particular, to my knowledge.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
2025-11-21 13:32 ` Schwarz, Konrad
@ 2025-11-21 14:10 ` Jan Kiszka
0 siblings, 0 replies; 8+ messages in thread
From: Jan Kiszka @ 2025-11-21 14:10 UTC (permalink / raw)
To: Schwarz, Konrad (FT RPD CED OES-DE), konrad.schwarz@gmail.com,
qemu-devel@nongnu.org
Cc: philmd@linaro.org
On 21.11.25 14:32, Schwarz, Konrad (FT RPD CED OES-DE) wrote:
>> From: Kiszka, Jan (FT RPD CED) <jan.kiszka@siemens.com>
>> Sent: Friday, November 21, 2025 11:16
>> Subject: Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to
>> O(1)
>>
>>> scripts/mkemmc.sh | 10 ++++++++--
>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh index
>>> 45dc3f08fa..d2c4e84b16 100755
>>> --- a/scripts/mkemmc.sh
>>> +++ b/scripts/mkemmc.sh
>>> @@ -37,13 +37,19 @@ usage() {
>>> exit "$1"
>>> }
>>>
>>> +file_size() {
>>> + ls_line=$(ls -Hdog "$1") || return
>>
>> This will not suppress the error message when a file does not exist or is not
>> accessible, so:
>>
>> ls_line=$(ls -Hdog "$1" 2>/dev/null) || return
>
> My take on this is:
>
> `ls' is able to produce informative diagnostic messages as it has
> access to `errno'. The additional information, e.g., whether an 'EACCES',
> an `ENOENT' or an `ENOTDIR' error has occurred, should in the majority
> of cases help the user in fixing the underlying problem. I think it would be
> counter-productive to suppress this.
Even if that was true, you should not fold this change into your O(1)
optimization and argue about that separately.
>
> (In fact, one could go further and consider using only the error message
> of `ls' and not provide an own error message at all.
I intentionally wanted to handle the error outside of the low-level
function here. The caller should not care how the size is retrieved -
implementation detail.
But the ls approach still has a major issue: It gives a size for a
directory, rather than failing with "not a readable file".
Jan
> In this case, it would be especially easy to return `ls' status back to
> the script's invoker, by simply invoking `exit' without an argument.)
>
>>
>>> + printf %s\\n "$ls_line" | cut -d\ -f3
>>> + unset ls_line
>>> +}
>>> +
>>> process_size() {
>>> name=$1
>>> image_file=$2
>>> alignment=$3
>>> image_arg=$4
>>> if [ "${image_arg#*:}" = "$image_arg" ]; then
>>> - if ! size=$(wc -c < "$image_file" 2>/dev/null); then
>>> + if ! size=$(file_size "$image_file"); then
>>> echo "Missing $name image '$image_file'." >&2
>>> exit 1
>>> fi
>>> @@ -105,7 +111,7 @@ check_truncation() {
>>> if [ "$image_file" = "/dev/zero" ]; then
>>> return
>>> fi
>>> - if ! actual_size=$(wc -c < "$image_file" 2>/dev/null); then
>>> + if ! actual_size=$(file_size "$image_file"); then
>>> echo "Missing image '$image_file'." >&2
>>> exit 1
>>> fi
>>
>> Thanks,
>> Jan
>>
>> --
>> Siemens AG, Foundational Technologies
>> Linux Expert Center
--
Siemens AG, Foundational Technologies
Linux Expert Center
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1)
[not found] ` <CABgObfb89UUyha3xz5d2MihRkg2WKBTpUOiNMA5o8oPE=MBMyQ@mail.gmail.com>
@ 2025-11-21 14:57 ` Schwarz, Konrad
0 siblings, 0 replies; 8+ messages in thread
From: Schwarz, Konrad @ 2025-11-21 14:57 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: philmd@linaro.org, Kiszka, Jan, qemu-devel@nongnu.org
> -----Original Message-----
> From: Paolo Bonzini <pbonzini@redhat.com>
> Sent: Friday, November 21, 2025 15:41
> > The script is designed to be cross-platform
>
> Sure but it's not broken, just less effective, and the optimization is actually
> quite common. I have just checked and it's in all of GNU coreutils, FreeBSD,
> OpenBSD, NetBSD. Rust uutils does not do it for stdin, though.
>
> > There is no guarantee that this optimization will be present on all potential
> platforms.
> > OTOH, it is virtually guaranteed that `ls' will always `stat()' the
> > file. It would seem prudent to avoid potential pitfalls.
>
> Parsing the output of "ls" is also a potential pitfall. In particular, I am not sure
> what happens if $image_file has a newline. Maybe this is better:
>
> ls -Hdog "$1" | {
> read ls_ign1 ls_ign2 ls_size ls_rest
> echo $ls_size
> }
You are right, I forgot about this case.
A perhaps simpler solution is to use `ls -Hdogq', which replaces non-printing characters
(including newline) in the filename with a `?'.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-11-22 5:35 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-19 17:28 [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) konrad.schwarz
2025-11-21 10:16 ` Jan Kiszka
2025-11-21 11:09 ` Daniel P. Berrangé
2025-11-21 14:07 ` Schwarz, Konrad
2025-11-21 13:32 ` Schwarz, Konrad
2025-11-21 14:10 ` Jan Kiszka
2025-11-21 12:25 ` Paolo Bonzini
[not found] ` <AS8PR10MB4680FED05DB882C0C7751E058AD5A@AS8PR10MB4680.EURPRD10.PROD.OUTLOOK.COM>
[not found] ` <CABgObfb89UUyha3xz5d2MihRkg2WKBTpUOiNMA5o8oPE=MBMyQ@mail.gmail.com>
2025-11-21 14:57 ` Schwarz, Konrad
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).