* [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) @ 2025-11-19 17:28 konrad.schwarz 2025-11-21 10:16 ` Jan Kiszka 2025-11-21 12:25 ` Paolo Bonzini 0 siblings, 2 replies; 8+ messages in thread From: konrad.schwarz @ 2025-11-19 17:28 UTC (permalink / raw) To: qemu-devel; +Cc: philmd, Konrad Schwarz, jan.kiszka The mkemmc.sh script calculates file sizes via `wc -c'. `wc' normally works by reading the entire file, resulting in O(n) performance. Unix file systems obviously know a file's size and POSIX `ls' reports this information unambiguously, so replacing `wc' with `ls' ensures O(1) performance. The files in question tend to be large making this change worthwhile. Signed-off-by: Konrad Schwarz <konrad.schwarz@siemens.com> --- scripts/mkemmc.sh | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh index 45dc3f08fa..d2c4e84b16 100755 --- a/scripts/mkemmc.sh +++ b/scripts/mkemmc.sh @@ -37,13 +37,19 @@ usage() { exit "$1" } +file_size() { + ls_line=$(ls -Hdog "$1") || return + printf %s\\n "$ls_line" | cut -d\ -f3 + unset ls_line +} + process_size() { name=$1 image_file=$2 alignment=$3 image_arg=$4 if [ "${image_arg#*:}" = "$image_arg" ]; then - if ! size=$(wc -c < "$image_file" 2>/dev/null); then + if ! size=$(file_size "$image_file"); then echo "Missing $name image '$image_file'." >&2 exit 1 fi @@ -105,7 +111,7 @@ check_truncation() { if [ "$image_file" = "/dev/zero" ]; then return fi - if ! actual_size=$(wc -c < "$image_file" 2>/dev/null); then + if ! actual_size=$(file_size "$image_file"); then echo "Missing image '$image_file'." >&2 exit 1 fi -- 2.39.5 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) 2025-11-19 17:28 [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) konrad.schwarz @ 2025-11-21 10:16 ` Jan Kiszka 2025-11-21 11:09 ` Daniel P. Berrangé 2025-11-21 13:32 ` Schwarz, Konrad 2025-11-21 12:25 ` Paolo Bonzini 1 sibling, 2 replies; 8+ messages in thread From: Jan Kiszka @ 2025-11-21 10:16 UTC (permalink / raw) To: konrad.schwarz, qemu-devel; +Cc: philmd, Konrad Schwarz You want to add From: Konrad Schwarz <konrad.schwarz@siemens.com> to the top of the commit message when submitting via a different account so that author and signer are aligned. Or use git sendemail which will do that when author and submitter address differ. On 19.11.25 18:28, konrad.schwarz@gmail.com wrote: > The mkemmc.sh script calculates file sizes via `wc -c'. `wc' > normally works by reading the entire file, resulting in O(n) performance. > > Unix file systems obviously know a file's size and POSIX `ls' reports this > information unambiguously, so replacing `wc' with `ls' ensures O(1) > performance. The files in question tend to be large making this change > worthwhile. Valid point. > > Signed-off-by: Konrad Schwarz <konrad.schwarz@siemens.com> > --- > scripts/mkemmc.sh | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh > index 45dc3f08fa..d2c4e84b16 100755 > --- a/scripts/mkemmc.sh > +++ b/scripts/mkemmc.sh > @@ -37,13 +37,19 @@ usage() { > exit "$1" > } > > +file_size() { > + ls_line=$(ls -Hdog "$1") || return This will not suppress the error message when a file does not exist or is not accessible, so: ls_line=$(ls -Hdog "$1" 2>/dev/null) || return > + printf %s\\n "$ls_line" | cut -d\ -f3 > + unset ls_line > +} > + > process_size() { > name=$1 > image_file=$2 > alignment=$3 > image_arg=$4 > if [ "${image_arg#*:}" = "$image_arg" ]; then > - if ! size=$(wc -c < "$image_file" 2>/dev/null); then > + if ! size=$(file_size "$image_file"); then > echo "Missing $name image '$image_file'." >&2 > exit 1 > fi > @@ -105,7 +111,7 @@ check_truncation() { > if [ "$image_file" = "/dev/zero" ]; then > return > fi > - if ! actual_size=$(wc -c < "$image_file" 2>/dev/null); then > + if ! actual_size=$(file_size "$image_file"); then > echo "Missing image '$image_file'." >&2 > exit 1 > fi Thanks, Jan -- Siemens AG, Foundational Technologies Linux Expert Center ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) 2025-11-21 10:16 ` Jan Kiszka @ 2025-11-21 11:09 ` Daniel P. Berrangé 2025-11-21 14:07 ` Schwarz, Konrad 2025-11-21 13:32 ` Schwarz, Konrad 1 sibling, 1 reply; 8+ messages in thread From: Daniel P. Berrangé @ 2025-11-21 11:09 UTC (permalink / raw) To: Jan Kiszka; +Cc: konrad.schwarz, qemu-devel, philmd, Konrad Schwarz On Fri, Nov 21, 2025 at 11:16:11AM +0100, Jan Kiszka wrote: > You want to add > > From: Konrad Schwarz <konrad.schwarz@siemens.com> > > to the top of the commit message when submitting via a different account > so that author and signer are aligned. Or use git sendemail which will > do that when author and submitter address differ. > > On 19.11.25 18:28, konrad.schwarz@gmail.com wrote: > > The mkemmc.sh script calculates file sizes via `wc -c'. `wc' > > normally works by reading the entire file, resulting in O(n) performance. > > > > Unix file systems obviously know a file's size and POSIX `ls' reports this > > information unambiguously, so replacing `wc' with `ls' ensures O(1) > > performance. The files in question tend to be large making this change > > worthwhile. > > Valid point. > > > > > Signed-off-by: Konrad Schwarz <konrad.schwarz@siemens.com> > > --- > > scripts/mkemmc.sh | 10 ++++++++-- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh > > index 45dc3f08fa..d2c4e84b16 100755 > > --- a/scripts/mkemmc.sh > > +++ b/scripts/mkemmc.sh > > @@ -37,13 +37,19 @@ usage() { > > exit "$1" > > } > > > > +file_size() { > > + ls_line=$(ls -Hdog "$1") || return > > This will not suppress the error message when a file does not exist or > is not accessible, so: > > ls_line=$(ls -Hdog "$1" 2>/dev/null) || return > > > + printf %s\\n "$ls_line" | cut -d\ -f3 > > + unset ls_line This parsing of 'ls' output could be simplified by using the 'stat' command with a format string to request only the file size. stat --format=%s "$1" With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) 2025-11-21 11:09 ` Daniel P. Berrangé @ 2025-11-21 14:07 ` Schwarz, Konrad 0 siblings, 0 replies; 8+ messages in thread From: Schwarz, Konrad @ 2025-11-21 14:07 UTC (permalink / raw) To: Daniel P. Berrangé, Kiszka, Jan Cc: konrad.schwarz@gmail.com, qemu-devel@nongnu.org, philmd@linaro.org > From: Daniel P. Berrangé <berrange@redhat.com> > Sent: Friday, November 21, 2025 12:09 > Subject: Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to > O(1) > > ls_line=$(ls -Hdog "$1" 2>/dev/null) || return > > > > > + printf %s\\n "$ls_line" | cut -d\ -f3 > > > + unset ls_line > > This parsing of 'ls' output could be simplified by using the 'stat' command > with a format string to request only the file size. > > stat --format=%s "$1" The use of `wc' results from the fact that `stat' is not available on all platforms, iMacOS in particular, to my knowledge. ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) 2025-11-21 10:16 ` Jan Kiszka 2025-11-21 11:09 ` Daniel P. Berrangé @ 2025-11-21 13:32 ` Schwarz, Konrad 2025-11-21 14:10 ` Jan Kiszka 1 sibling, 1 reply; 8+ messages in thread From: Schwarz, Konrad @ 2025-11-21 13:32 UTC (permalink / raw) To: Kiszka, Jan, konrad.schwarz@gmail.com, qemu-devel@nongnu.org Cc: philmd@linaro.org > From: Kiszka, Jan (FT RPD CED) <jan.kiszka@siemens.com> > Sent: Friday, November 21, 2025 11:16 > Subject: Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to > O(1) > > > scripts/mkemmc.sh | 10 ++++++++-- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh index > > 45dc3f08fa..d2c4e84b16 100755 > > --- a/scripts/mkemmc.sh > > +++ b/scripts/mkemmc.sh > > @@ -37,13 +37,19 @@ usage() { > > exit "$1" > > } > > > > +file_size() { > > + ls_line=$(ls -Hdog "$1") || return > > This will not suppress the error message when a file does not exist or is not > accessible, so: > > ls_line=$(ls -Hdog "$1" 2>/dev/null) || return My take on this is: `ls' is able to produce informative diagnostic messages as it has access to `errno'. The additional information, e.g., whether an 'EACCES', an `ENOENT' or an `ENOTDIR' error has occurred, should in the majority of cases help the user in fixing the underlying problem. I think it would be counter-productive to suppress this. (In fact, one could go further and consider using only the error message of `ls' and not provide an own error message at all. In this case, it would be especially easy to return `ls' status back to the script's invoker, by simply invoking `exit' without an argument.) > > > + printf %s\\n "$ls_line" | cut -d\ -f3 > > + unset ls_line > > +} > > + > > process_size() { > > name=$1 > > image_file=$2 > > alignment=$3 > > image_arg=$4 > > if [ "${image_arg#*:}" = "$image_arg" ]; then > > - if ! size=$(wc -c < "$image_file" 2>/dev/null); then > > + if ! size=$(file_size "$image_file"); then > > echo "Missing $name image '$image_file'." >&2 > > exit 1 > > fi > > @@ -105,7 +111,7 @@ check_truncation() { > > if [ "$image_file" = "/dev/zero" ]; then > > return > > fi > > - if ! actual_size=$(wc -c < "$image_file" 2>/dev/null); then > > + if ! actual_size=$(file_size "$image_file"); then > > echo "Missing image '$image_file'." >&2 > > exit 1 > > fi > > Thanks, > Jan > > -- > Siemens AG, Foundational Technologies > Linux Expert Center ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) 2025-11-21 13:32 ` Schwarz, Konrad @ 2025-11-21 14:10 ` Jan Kiszka 0 siblings, 0 replies; 8+ messages in thread From: Jan Kiszka @ 2025-11-21 14:10 UTC (permalink / raw) To: Schwarz, Konrad (FT RPD CED OES-DE), konrad.schwarz@gmail.com, qemu-devel@nongnu.org Cc: philmd@linaro.org On 21.11.25 14:32, Schwarz, Konrad (FT RPD CED OES-DE) wrote: >> From: Kiszka, Jan (FT RPD CED) <jan.kiszka@siemens.com> >> Sent: Friday, November 21, 2025 11:16 >> Subject: Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to >> O(1) >> >>> scripts/mkemmc.sh | 10 ++++++++-- >>> 1 file changed, 8 insertions(+), 2 deletions(-) >>> >>> diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh index >>> 45dc3f08fa..d2c4e84b16 100755 >>> --- a/scripts/mkemmc.sh >>> +++ b/scripts/mkemmc.sh >>> @@ -37,13 +37,19 @@ usage() { >>> exit "$1" >>> } >>> >>> +file_size() { >>> + ls_line=$(ls -Hdog "$1") || return >> >> This will not suppress the error message when a file does not exist or is not >> accessible, so: >> >> ls_line=$(ls -Hdog "$1" 2>/dev/null) || return > > My take on this is: > > `ls' is able to produce informative diagnostic messages as it has > access to `errno'. The additional information, e.g., whether an 'EACCES', > an `ENOENT' or an `ENOTDIR' error has occurred, should in the majority > of cases help the user in fixing the underlying problem. I think it would be > counter-productive to suppress this. Even if that was true, you should not fold this change into your O(1) optimization and argue about that separately. > > (In fact, one could go further and consider using only the error message > of `ls' and not provide an own error message at all. I intentionally wanted to handle the error outside of the low-level function here. The caller should not care how the size is retrieved - implementation detail. But the ls approach still has a major issue: It gives a size for a directory, rather than failing with "not a readable file". Jan > In this case, it would be especially easy to return `ls' status back to > the script's invoker, by simply invoking `exit' without an argument.) > >> >>> + printf %s\\n "$ls_line" | cut -d\ -f3 >>> + unset ls_line >>> +} >>> + >>> process_size() { >>> name=$1 >>> image_file=$2 >>> alignment=$3 >>> image_arg=$4 >>> if [ "${image_arg#*:}" = "$image_arg" ]; then >>> - if ! size=$(wc -c < "$image_file" 2>/dev/null); then >>> + if ! size=$(file_size "$image_file"); then >>> echo "Missing $name image '$image_file'." >&2 >>> exit 1 >>> fi >>> @@ -105,7 +111,7 @@ check_truncation() { >>> if [ "$image_file" = "/dev/zero" ]; then >>> return >>> fi >>> - if ! actual_size=$(wc -c < "$image_file" 2>/dev/null); then >>> + if ! actual_size=$(file_size "$image_file"); then >>> echo "Missing image '$image_file'." >&2 >>> exit 1 >>> fi >> >> Thanks, >> Jan >> >> -- >> Siemens AG, Foundational Technologies >> Linux Expert Center -- Siemens AG, Foundational Technologies Linux Expert Center ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) 2025-11-19 17:28 [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) konrad.schwarz 2025-11-21 10:16 ` Jan Kiszka @ 2025-11-21 12:25 ` Paolo Bonzini [not found] ` <AS8PR10MB4680FED05DB882C0C7751E058AD5A@AS8PR10MB4680.EURPRD10.PROD.OUTLOOK.COM> 1 sibling, 1 reply; 8+ messages in thread From: Paolo Bonzini @ 2025-11-21 12:25 UTC (permalink / raw) To: konrad.schwarz, qemu-devel; +Cc: philmd, Konrad Schwarz, jan.kiszka On 11/19/25 18:28, konrad.schwarz@gmail.com wrote: > The mkemmc.sh script calculates file sizes via `wc -c'. `wc' > normally works by reading the entire file, resulting in O(n) performance. Even something as mundane as 'wc' can surprise you! Running "strace wc -c < somefile.txt" shows this: fstat(0, {st_mode=S_IFREG|0644, st_size=6900, ...}) = 0 lseek(0, 0, SEEK_CUR) = 0 lseek(0, 6900, SEEK_CUR) = 6900 So wc notices you don't need word or line counts, and takes a shortcut. Paolo > Unix file systems obviously know a file's size and POSIX `ls' reports this > information unambiguously, so replacing `wc' with `ls' ensures O(1) > performance. The files in question tend to be large making this change > worthwhile. > > Signed-off-by: Konrad Schwarz <konrad.schwarz@siemens.com> > --- > scripts/mkemmc.sh | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/scripts/mkemmc.sh b/scripts/mkemmc.sh > index 45dc3f08fa..d2c4e84b16 100755 > --- a/scripts/mkemmc.sh > +++ b/scripts/mkemmc.sh > @@ -37,13 +37,19 @@ usage() { > exit "$1" > } > > +file_size() { > + ls_line=$(ls -Hdog "$1") || return > + printf %s\\n "$ls_line" | cut -d\ -f3 > + unset ls_line > +} > + > process_size() { > name=$1 > image_file=$2 > alignment=$3 > image_arg=$4 > if [ "${image_arg#*:}" = "$image_arg" ]; then > - if ! size=$(wc -c < "$image_file" 2>/dev/null); then > + if ! size=$(file_size "$image_file"); then > echo "Missing $name image '$image_file'." >&2 > exit 1 > fi > @@ -105,7 +111,7 @@ check_truncation() { > if [ "$image_file" = "/dev/zero" ]; then > return > fi > - if ! actual_size=$(wc -c < "$image_file" 2>/dev/null); then > + if ! actual_size=$(file_size "$image_file"); then > echo "Missing image '$image_file'." >&2 > exit 1 > fi ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <AS8PR10MB4680FED05DB882C0C7751E058AD5A@AS8PR10MB4680.EURPRD10.PROD.OUTLOOK.COM>]
[parent not found: <CABgObfb89UUyha3xz5d2MihRkg2WKBTpUOiNMA5o8oPE=MBMyQ@mail.gmail.com>]
* RE: [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) [not found] ` <CABgObfb89UUyha3xz5d2MihRkg2WKBTpUOiNMA5o8oPE=MBMyQ@mail.gmail.com> @ 2025-11-21 14:57 ` Schwarz, Konrad 0 siblings, 0 replies; 8+ messages in thread From: Schwarz, Konrad @ 2025-11-21 14:57 UTC (permalink / raw) To: Paolo Bonzini; +Cc: philmd@linaro.org, Kiszka, Jan, qemu-devel@nongnu.org > -----Original Message----- > From: Paolo Bonzini <pbonzini@redhat.com> > Sent: Friday, November 21, 2025 15:41 > > The script is designed to be cross-platform > > Sure but it's not broken, just less effective, and the optimization is actually > quite common. I have just checked and it's in all of GNU coreutils, FreeBSD, > OpenBSD, NetBSD. Rust uutils does not do it for stdin, though. > > > There is no guarantee that this optimization will be present on all potential > platforms. > > OTOH, it is virtually guaranteed that `ls' will always `stat()' the > > file. It would seem prudent to avoid potential pitfalls. > > Parsing the output of "ls" is also a potential pitfall. In particular, I am not sure > what happens if $image_file has a newline. Maybe this is better: > > ls -Hdog "$1" | { > read ls_ign1 ls_ign2 ls_size ls_rest > echo $ls_size > } You are right, I forgot about this case. A perhaps simpler solution is to use `ls -Hdogq', which replaces non-printing characters (including newline) in the filename with a `?'. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-11-22 5:35 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-19 17:28 [PATCH 1/1] scripts: Changed potential O(n) file size calculation to O(1) konrad.schwarz
2025-11-21 10:16 ` Jan Kiszka
2025-11-21 11:09 ` Daniel P. Berrangé
2025-11-21 14:07 ` Schwarz, Konrad
2025-11-21 13:32 ` Schwarz, Konrad
2025-11-21 14:10 ` Jan Kiszka
2025-11-21 12:25 ` Paolo Bonzini
[not found] ` <AS8PR10MB4680FED05DB882C0C7751E058AD5A@AS8PR10MB4680.EURPRD10.PROD.OUTLOOK.COM>
[not found] ` <CABgObfb89UUyha3xz5d2MihRkg2WKBTpUOiNMA5o8oPE=MBMyQ@mail.gmail.com>
2025-11-21 14:57 ` Schwarz, Konrad
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).