All of lore.kernel.org
 help / color / mirror / Atom feed
* [bitbake-devel][PATCH] siggen: Fix sigtask data not being renamed atomically
@ 2022-08-01 20:34 Joshua Watt
  2022-08-01 21:05 ` Christopher Larson
  2022-08-02 13:14 ` [bitbake-devel][PATCH v2] " Joshua Watt
  0 siblings, 2 replies; 6+ messages in thread
From: Joshua Watt @ 2022-08-01 20:34 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Joshua Watt

Signature generation uses mkstemp() to get a file descriptor to a unique
file and then write the signature into it, however it closed the file
before doing the chmod() and rename() operations. Closing the file means
that other mkstemp() could potentially open the same file and race with
the chmod() and rename(), causing a error. While it may not sound like
this would be very likely, glibc (at least) generates the filename for
mkstemp() using the system clock, meaning that it is much more likely
for highly parallel builds sharing sstate over NFS to encounter the race
condition.

To fix the problem, perform the chmod() and rename() while the file is
still open, since this prevents other mkstemp() calls from being able to
open the file (due to the O_EXCL flag).

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/siggen.py | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index 3f3d6df54d..55f25235df 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -426,18 +426,18 @@ class SignatureGeneratorBasic(SignatureGenerator):
                 sigfile = sigfile.replace(self.taskhash[tid], computed_taskhash)
 
         fd, tmpfile = tempfile.mkstemp(dir=os.path.dirname(sigfile), prefix="sigtask.")
-        try:
-            with bb.compress.zstd.open(fd, "wt", encoding="utf-8", num_threads=1) as f:
+        with bb.compress.zstd.open(fd, "wt", encoding="utf-8", num_threads=1) as f:
+            try:
                 json.dump(data, f, sort_keys=True, separators=(",", ":"), cls=SetEncoder)
                 f.flush()
-            os.chmod(tmpfile, 0o664)
-            bb.utils.rename(tmpfile, sigfile)
-        except (OSError, IOError) as err:
-            try:
-                os.unlink(tmpfile)
-            except OSError:
-                pass
-            raise err
+                os.chmod(tmpfile, 0o664)
+                bb.utils.rename(tmpfile, sigfile)
+            except (OSError, IOError) as err:
+                try:
+                    os.unlink(tmpfile)
+                except OSError:
+                    pass
+                raise err
 
     def dump_sigfn(self, fn, dataCaches, options):
         if fn in self.taskdeps:
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [bitbake-devel][PATCH] siggen: Fix sigtask data not being renamed atomically
  2022-08-01 20:34 [bitbake-devel][PATCH] siggen: Fix sigtask data not being renamed atomically Joshua Watt
@ 2022-08-01 21:05 ` Christopher Larson
  2022-08-02 13:14 ` [bitbake-devel][PATCH v2] " Joshua Watt
  1 sibling, 0 replies; 6+ messages in thread
From: Christopher Larson @ 2022-08-01 21:05 UTC (permalink / raw)
  To: Joshua Watt; +Cc: bitbake-devel

[-- Attachment #1: Type: text/plain, Size: 3192 bytes --]

Hmm, would it be better to os.chmod(fd, mode) rather than by filename,
since it's still open?

On Mon, Aug 1, 2022 at 1:34 PM Joshua Watt <JPEWhacker@gmail.com> wrote:

> Signature generation uses mkstemp() to get a file descriptor to a unique
> file and then write the signature into it, however it closed the file
> before doing the chmod() and rename() operations. Closing the file means
> that other mkstemp() could potentially open the same file and race with
> the chmod() and rename(), causing a error. While it may not sound like
> this would be very likely, glibc (at least) generates the filename for
> mkstemp() using the system clock, meaning that it is much more likely
> for highly parallel builds sharing sstate over NFS to encounter the race
> condition.
>
> To fix the problem, perform the chmod() and rename() while the file is
> still open, since this prevents other mkstemp() calls from being able to
> open the file (due to the O_EXCL flag).
>
> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> ---
>  bitbake/lib/bb/siggen.py | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
> index 3f3d6df54d..55f25235df 100644
> --- a/bitbake/lib/bb/siggen.py
> +++ b/bitbake/lib/bb/siggen.py
> @@ -426,18 +426,18 @@ class SignatureGeneratorBasic(SignatureGenerator):
>                  sigfile = sigfile.replace(self.taskhash[tid],
> computed_taskhash)
>
>          fd, tmpfile = tempfile.mkstemp(dir=os.path.dirname(sigfile),
> prefix="sigtask.")
> -        try:
> -            with bb.compress.zstd.open(fd, "wt", encoding="utf-8",
> num_threads=1) as f:
> +        with bb.compress.zstd.open(fd, "wt", encoding="utf-8",
> num_threads=1) as f:
> +            try:
>                  json.dump(data, f, sort_keys=True, separators=(",", ":"),
> cls=SetEncoder)
>                  f.flush()
> -            os.chmod(tmpfile, 0o664)
> -            bb.utils.rename(tmpfile, sigfile)
> -        except (OSError, IOError) as err:
> -            try:
> -                os.unlink(tmpfile)
> -            except OSError:
> -                pass
> -            raise err
> +                os.chmod(tmpfile, 0o664)
> +                bb.utils.rename(tmpfile, sigfile)
> +            except (OSError, IOError) as err:
> +                try:
> +                    os.unlink(tmpfile)
> +                except OSError:
> +                    pass
> +                raise err
>
>      def dump_sigfn(self, fn, dataCaches, options):
>          if fn in self.taskdeps:
> --
> 2.33.0
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#13862):
> https://lists.openembedded.org/g/bitbake-devel/message/13862
> Mute This Topic: https://lists.openembedded.org/mt/92757646/3617123
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [
> kergoth@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>

-- 
Christopher Larson
chris_larson@mentor.com, chris.larson@siemens.com, kergoth@gmail.com
Principal Software Engineer, Embedded Linux Solutions, Siemens Digital
Industries Software

[-- Attachment #2: Type: text/html, Size: 4609 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [bitbake-devel][PATCH v2] siggen: Fix sigtask data not being renamed atomically
  2022-08-01 20:34 [bitbake-devel][PATCH] siggen: Fix sigtask data not being renamed atomically Joshua Watt
  2022-08-01 21:05 ` Christopher Larson
@ 2022-08-02 13:14 ` Joshua Watt
  2022-08-03 11:40   ` Rasmus Villemoes
  1 sibling, 1 reply; 6+ messages in thread
From: Joshua Watt @ 2022-08-02 13:14 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Joshua Watt

Signature generation uses mkstemp() to get a file descriptor to a unique
file and then write the signature into it, however it closed the file
before doing the chmod() and rename() operations. Closing the file means
that other mkstemp() could potentially open the same file and race with
the chmod() and rename(), causing a error. While it may not sound like
this would be very likely, glibc (at least) generates the filename for
mkstemp() using the system clock, meaning that it is much more likely
for highly parallel builds sharing sstate over NFS to encounter the race
condition.

To fix the problem, perform the chmod() and rename() while the file is
still open, since this prevents other mkstemp() calls from being able to
open the file (due to the O_EXCL flag).

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/siggen.py | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index 3f3d6df54d..f3c8ff796f 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -426,18 +426,18 @@ class SignatureGeneratorBasic(SignatureGenerator):
                 sigfile = sigfile.replace(self.taskhash[tid], computed_taskhash)
 
         fd, tmpfile = tempfile.mkstemp(dir=os.path.dirname(sigfile), prefix="sigtask.")
-        try:
-            with bb.compress.zstd.open(fd, "wt", encoding="utf-8", num_threads=1) as f:
+        with bb.compress.zstd.open(fd, "wt", encoding="utf-8", num_threads=1) as f:
+            try:
                 json.dump(data, f, sort_keys=True, separators=(",", ":"), cls=SetEncoder)
                 f.flush()
-            os.chmod(tmpfile, 0o664)
-            bb.utils.rename(tmpfile, sigfile)
-        except (OSError, IOError) as err:
-            try:
-                os.unlink(tmpfile)
-            except OSError:
-                pass
-            raise err
+                os.fchmod(fd, 0o664)
+                bb.utils.rename(tmpfile, sigfile)
+            except (OSError, IOError) as err:
+                try:
+                    os.unlink(tmpfile)
+                except OSError:
+                    pass
+                raise err
 
     def dump_sigfn(self, fn, dataCaches, options):
         if fn in self.taskdeps:
-- 
2.33.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [bitbake-devel][PATCH v2] siggen: Fix sigtask data not being renamed atomically
  2022-08-02 13:14 ` [bitbake-devel][PATCH v2] " Joshua Watt
@ 2022-08-03 11:40   ` Rasmus Villemoes
  2022-08-03 13:21     ` Joshua Watt
  0 siblings, 1 reply; 6+ messages in thread
From: Rasmus Villemoes @ 2022-08-03 11:40 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Joshua Watt

On 02/08/2022 15.14, Joshua Watt via lists.openembedded.org wrote:
> Signature generation uses mkstemp() to get a file descriptor to a unique
> file and then write the signature into it, however it closed the file
> before doing the chmod() and rename() operations. Closing the file means
> that other mkstemp() could potentially open the same file and race with
> the chmod() and rename(), causing a error. While it may not sound like
> this would be very likely, glibc (at least) generates the filename for
> mkstemp() using the system clock, meaning that it is much more likely
> for highly parallel builds sharing sstate over NFS to encounter the race
> condition.
> 
> To fix the problem, perform the chmod() and rename() while the file is
> still open, since this prevents other mkstemp() calls from being able to
> open the file (due to the O_EXCL flag).

Eh, O_CREAT|O_EXCL should prevent opening an already existing file,
whether somebody has it open or not, so I don't see how another
mkstemp() could end up opening "our" file. Unless O_EXCL semantics are
broken on NFS and only work by accident when the local client does have
the file open - but what would then prevent other clients with,
presumably, the exact same system time from clobbering our file, whether
or not the chmod+rename is done with the fd still held?

The patch is probably fine (as in, shouldn't make anything worse), but I
think it would be nice to understand just what the problem actually is,
and if this fixes it, how.

Rasmus


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bitbake-devel][PATCH v2] siggen: Fix sigtask data not being renamed atomically
  2022-08-03 11:40   ` Rasmus Villemoes
@ 2022-08-03 13:21     ` Joshua Watt
  2022-08-03 14:04       ` Joshua Watt
  0 siblings, 1 reply; 6+ messages in thread
From: Joshua Watt @ 2022-08-03 13:21 UTC (permalink / raw)
  To: Rasmus Villemoes, bitbake-devel

[-- Attachment #1: Type: text/plain, Size: 2438 bytes --]


On 8/3/22 06:40, Rasmus Villemoes wrote:
> On 02/08/2022 15.14, Joshua Watt via lists.openembedded.org wrote:
>> Signature generation uses mkstemp() to get a file descriptor to a unique
>> file and then write the signature into it, however it closed the file
>> before doing the chmod() and rename() operations. Closing the file means
>> that other mkstemp() could potentially open the same file and race with
>> the chmod() and rename(), causing a error. While it may not sound like
>> this would be very likely, glibc (at least) generates the filename for
>> mkstemp() using the system clock, meaning that it is much more likely
>> for highly parallel builds sharing sstate over NFS to encounter the race
>> condition.
>>
>> To fix the problem, perform the chmod() and rename() while the file is
>> still open, since this prevents other mkstemp() calls from being able to
>> open the file (due to the O_EXCL flag).
> Eh, O_CREAT|O_EXCL should prevent opening an already existing file,
> whether somebody has it open or not, so I don't see how another
> mkstemp() could end up opening "our" file. Unless O_EXCL semantics are
> broken on NFS and only work by accident when the local client does have
> the file open - but what would then prevent other clients with,
> presumably, the exact same system time from clobbering our file, whether
> or not the chmod+rename is done with the fd still held?

Yes, for some reason I had it my head that O_EXCL only affects open 
files; so I'm not sure. Maybe NFS is the culprit :/

>
> The patch is probably fine (as in, shouldn't make anything worse), but I
> think it would be nice to understand just what the problem actually is,
> and if this fixes it, how.

Ya, we are definetely getting errors like:


FileNotFoundError: [Errno 2] No such file or directory: 
'/mnt/releases/sstate/99/sigtask.wq6gkrm7' -> 
'/mnt/releases/sstate/99/sstate:hdparm::9.48:r0::3:990d8b39c15752a2dd68369cb3609283_unpack.tgz.siginfo'

With a backtrace to the os.rename() under heavy load. I don't understand 
how the chmod() can succeed and the rename fail unless there is a race. 
It's very hard to diagnose what went wrong post-mortem.

My other option was to manually add more entropy to the file names by 
prepending some random characters after "sigtask", since the 
"randomeness" of mkstemp() isn't great all tasks use the same sigtask 
file prefix (which increases the likely hood of a collision).
>
> Rasmus

[-- Attachment #2: Type: text/html, Size: 6171 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bitbake-devel][PATCH v2] siggen: Fix sigtask data not being renamed atomically
  2022-08-03 13:21     ` Joshua Watt
@ 2022-08-03 14:04       ` Joshua Watt
  0 siblings, 0 replies; 6+ messages in thread
From: Joshua Watt @ 2022-08-03 14:04 UTC (permalink / raw)
  To: Rasmus Villemoes, bitbake-devel

On Wed, Aug 3, 2022 at 8:21 AM Joshua Watt <jpewhacker@gmail.com> wrote:
>
>
> On 8/3/22 06:40, Rasmus Villemoes wrote:
>
> On 02/08/2022 15.14, Joshua Watt via lists.openembedded.org wrote:
>
> Signature generation uses mkstemp() to get a file descriptor to a unique
> file and then write the signature into it, however it closed the file
> before doing the chmod() and rename() operations. Closing the file means
> that other mkstemp() could potentially open the same file and race with
> the chmod() and rename(), causing a error. While it may not sound like
> this would be very likely, glibc (at least) generates the filename for
> mkstemp() using the system clock, meaning that it is much more likely
> for highly parallel builds sharing sstate over NFS to encounter the race
> condition.
>
> To fix the problem, perform the chmod() and rename() while the file is
> still open, since this prevents other mkstemp() calls from being able to
> open the file (due to the O_EXCL flag).
>
> Eh, O_CREAT|O_EXCL should prevent opening an already existing file,
> whether somebody has it open or not, so I don't see how another
> mkstemp() could end up opening "our" file. Unless O_EXCL semantics are
> broken on NFS and only work by accident when the local client does have
> the file open - but what would then prevent other clients with,
> presumably, the exact same system time from clobbering our file, whether
> or not the chmod+rename is done with the fd still held?
>
> Yes, for some reason I had it my head that O_EXCL only affects open files; so I'm not sure. Maybe NFS is the culprit :/
>
> The patch is probably fine (as in, shouldn't make anything worse), but I
> think it would be nice to understand just what the problem actually is,
> and if this fixes it, how.
>
> Ya, we are definetely getting errors like:
>
>
> FileNotFoundError: [Errno 2] No such file or directory: '/mnt/releases/sstate/99/sigtask.wq6gkrm7' -> '/mnt/releases/sstate/99/sstate:hdparm::9.48:r0::3:990d8b39c15752a2dd68369cb3609283_unpack.tgz.siginfo'
>
> With a backtrace to the os.rename() under heavy load. I don't understand how the chmod() can succeed and the rename fail unless there is a race. It's very hard to diagnose what went wrong post-mortem.
>
> My other option was to manually add more entropy to the file names by prepending some random characters after "sigtask", since the "randomeness" of mkstemp() isn't great all tasks use the same sigtask file prefix (which increases the likely hood of a collision).

Ok, after further investigation, I realized that our NFS server (which
I don't have any control over) is not Linux, so I have the sneaking
suspicion that it doesn't implement O_EXCL correctly, or perhaps not
correctly across multiple clients. In light of that, this patch isn't
going to do what I want anyway, and I've sent a new patch that
increases the entropy to mkstemp() so it's not time based which should
solve the problem in all cases

>
> Rasmus


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-08-03 14:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-01 20:34 [bitbake-devel][PATCH] siggen: Fix sigtask data not being renamed atomically Joshua Watt
2022-08-01 21:05 ` Christopher Larson
2022-08-02 13:14 ` [bitbake-devel][PATCH v2] " Joshua Watt
2022-08-03 11:40   ` Rasmus Villemoes
2022-08-03 13:21     ` Joshua Watt
2022-08-03 14:04       ` Joshua Watt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.