Buildroot Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Arnout Vandecappelle <arnout@mind.be>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCH 2/5] support/download: properly use temp files
Date: Tue, 08 Jul 2014 18:42:05 +0200	[thread overview]
Message-ID: <53BC1F5D.2050407@mind.be> (raw)
In-Reply-To: <20140707213802.GC3806@free.fr>

On 07/07/14 23:38, Yann E. MORIN wrote:
> Arnout, All,
> 
> On 2014-07-07 08:11 +0200, Arnout Vandecappelle spake thusly:
>> On 06/07/14 23:27, Yann E. MORIN wrote:
>>> When using 'mv' to move files to temp files, the existing temp file is
>>> forcibly removed by 'mv', and a new file is created.
>>>
>>> Although this should have little impact to us, there is still a race
>>> conditions in case two parallel downloads would step on each other's
>>> toes, and it goes like that:
>>>
>>>     Process 1                       Process 2
>>>     mktemp tmp_output
>>>     mv tmp_dl tmp_output
>>>         removes tmp_output
>>>                                     mktemp tmp_ouput
>>>         writes to tmp_output
>>>                                     mv tmp_dl tmp_output
>>>                                         removes tmp_output
>>>                                         writes to tmp_output
>>>     mv tmp_output output
>>>                                     mv tmp_output output
>>>
>>> In this case, mktemp has the opportunity to create the same tmp_output
>>> temporary file, and we trash the content from one with the content
>>> from the other, and the last to do the final 'mv' breaks horibly.
>>>
>>> Instead, use 'cat', which guarantees that tmp_output is not removed
>>> before writing to it.
>>
>>  Not that it makes a real difference, but I think that 'cp' is a more natural
>> way to do this.
> 
> I am not sure how cp handles copying over an existing file. I'll
> check...

 It does. It only unlinks if open(O_TRUNC) fails. (Checked with strace for
coreutils and my reading the source for busybox.)


> 
>>> This complexifies a bit the local-files (cp) helper, but better safe
>>> than sorry.
>>>
>>> (Note: this is a purely theoretical issue, I did not observe it yet, but
>>> it is slated to happen one day or the other, and will be very hard to
>>> debug then.)
>>
>>  Actually, since the mktemp string is based on time and PID, the chance of this
>> happening is really vanishingly small. That said, better safe than sorry.
> 
> Yes, the opportunity for a collision is very low, but it's not
> impossible. After all, that's the problem with race conditions: given
> sufficient time and trials, you'll hit it, and it is very hard to debug,
> especially in this case, since the condition is not reproducible (random
> names.)

 Well, since mktemp uses /dev/urandom to generate the file pattern (I was wrong
about time/pid, that's only if urandom is not available), the 6-character random
number has a collision frequency of about 10^10. The chance that such a
collision occurs at exactly the moment that we're doing two parallel downloads
of the same source is pretty darn small - much smaller than the chance of a sha1
collision I think.

> 
>>> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
>>> ---
>>>  support/download/bzr  | 2 +-
>>>  support/download/cp   | 9 ++++++---
>>>  support/download/cvs  | 2 +-
>>>  support/download/git  | 4 ++--
>>>  support/download/hg   | 2 +-
>>>  support/download/scp  | 2 +-
>>>  support/download/svn  | 2 +-
>>>  support/download/wget | 2 +-
>>>  8 files changed, 14 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/support/download/bzr b/support/download/bzr
>>> index 19d837d..f86fa82 100755
>>> --- a/support/download/bzr
>>> +++ b/support/download/bzr
>>> @@ -27,7 +27,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )"
>>>  
>>>  ret=1
>>>  if ${BZR} export --format=tgz "${tmp_dl}" "${repo}" -r "${rev}"; then
>>> -    if mv "${tmp_dl}" "${tmp_output}"; then
>>> +    if cat "${tmp_dl}" >"${tmp_output}"; then
>>>          mv "${tmp_output}" "${output}"
>>>          ret=0
>>>      fi
>>
>>  Not really related to this patch, but why do we need this ${tmp_dl} to begin
>> with? Especially since we're already "occupying" a tempfile in DL_DIR anyway.
> 
> The idea was not to pollute BR2_DL_DIR, in case the download fails.
> Hence this dance:
>   - download to a disposable area (BUILD_DIR);
>   - move to a temp file in BR2_DL_DIR;
>   - atomically rename to the final file.
> 
> But granted, since  we remove failed downloads, we can skip the
> intermediary temp file in BUILD_DIR (although I still do not like much
> the idea of polluting BR2_DL_DIR.)

 Well, now you pollute BR2_DL_DIR with ${tmp_output} instead of ${tmp_dl}. Oh,
now I get it: if a download is interrupted, you'll leave the partial download in
BUILD_DIR instead of DL_DIR. OK, sorry about the noise.

> 
> I think there is still a case for which we can't remove the temp file...
> Ah yes, if the user cancels a build with Ctrl-C, the script is aborted,
> so we leave a .XXXXXX file in BR2_DL_DIR.

 Ctrl-C is SIGINT so it can be trap'ped.

> 
> But even with the three-step dance above, there is still a small window
> for leaving out cruft in BR2_DL_DIR; but that window is much, much
> smaller than the download window.

 Ack.

> 
>>  Also, with this added complexity, the download helper scripts are becoming
>> quite similar. So wouldn't it be a good idea to refactor them?
> 
> I think we already had this discussion back when I introduced the series:
>     http://lists.busybox.net/pipermail/buildroot/2014-January/086826.html
> 
> But for now, there are a few important fixes I'd like be applied before
> going further.

 OK.


> 
>> I'm thinking of
>> something like this:
>>
>> support/download/helper:
>>
>> #!/bin/bash
>>
>> # We want to catch any command failure, and exit immediately
>> set -e
>>
>> # Generic download helper
>> # Call it with:
>> #   $1: specific download helper
>> #   $2: output file
>> #   ...: arguments for the specific download helper
>>
>> download="${1}"
>> output="${2}"
>> shift 2
>>
>> tmp_output="$( mktemp ... )"
>> if "${download}" "${tmp_output}" "$@"; then
>> 	# Blah blah need things to be atomic blah blah
>> 	mv "${tmp_output}" "${output}"
>> fi
> 
> Oh, you meant having a wrapper script above all the other helpers, and
> delegate to the helpers only the download, and handle the atomicity
> dance to the wrapper. Hmmm... Might be worth looking at, probably, yes.

 But if you have more urgent fixes, that's OK.

> 
>>  I expect there will be even more common stuff between the download helpers in
>> the future, so it makes sense to have this.
> 
> Well, for one, moving the hash check in that wrapper, for example...
> 
[snip]
>>> diff --git a/support/download/git b/support/download/git
>>> index badb491..b0031e5 100755
>>> --- a/support/download/git
>>> +++ b/support/download/git
>>> @@ -43,8 +43,8 @@ fi
>>>  
>>>  ret=1
>>>  pushd "${repodir}" >/dev/null
>>> -if ${GIT} archive --prefix="${basename}/" -o "${tmp_tar}" \
>>> -                  --format=tar "${cset}"; then
>>> +if ${GIT} archive --prefix="${basename}/" --format=tar "${cset}" \
>>> +                  >"${tmp_tar}"; then
>>
>>  What's the reason for this change? I've checked with strace, and git archive
>> doesn't seem to unlink, it just does open(O_TRUNC) like cp.
> 
> Ah, so you did strace both? I straced scp, and it does unlink.
> Are we sure all cp and all tar we can encounter will all use
> open(O_TRUNC) ?

 Well, the unlink behaviour of mv is rather surprising if you ask me. Normally
when you create a file that already exists, you'd want it to retain the
attributes it has (owner, group, mod, acl, symlink, hardlinks).

> 
> It'd rather go on the safe side, and not assume much about this,
> especially since that behaviour may change in the future, or may not
> show in older versions, or different versions.

 OK, fair enough.

> 
>>>      if gzip -c "${tmp_tar}" >"${tmp_output}"; then
>>>          mv "${tmp_output}" "${output}"
>>>          ret=0
>>> diff --git a/support/download/hg b/support/download/hg
>>> index 6e9e26b..8b36db9 100755
>>> --- a/support/download/hg
>>> +++ b/support/download/hg
>>> @@ -35,7 +35,7 @@ ret=1
>>>  if ${HG} clone --noupdate --rev "${cset}" "${repo}" "${repodir}"; then
>>>      if ${HG} archive --repository "${repodir}" --type tgz \
>>>                       --prefix "${basename}" --rev "${cset}" \
>>> -                     "${tmp_output}"; then
>>> +                     - >"${tmp_output}"; then
>>
>>  I didn't check for hg, but I also don't expect it to unlink() first. In fact,
>> it's mv's behaviour of unlinking first that is surprising.
> 
> Well, scp also unlinks first.

 No it doesn't... Or else I'm doing something really wrong with my strace...


 Regards,
 Arnout

> 
> And same answer, I don;t want to rely on such a behaviour. Redirecting
> works the same everywhere, though.
> 
>>>          mv "${tmp_output}" "${output}"
>>>          ret=0
>>>      fi
>>> diff --git a/support/download/scp b/support/download/scp
>>> index d3aad43..e16ece5 100755
>>> --- a/support/download/scp
>>> +++ b/support/download/scp
>>> @@ -17,7 +17,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )"
>>>  
>>>  ret=1
>>>  if ${SCP} "${url}" "${tmp_dl}"; then
>>> -    if mv "${tmp_dl}" "${tmp_output}"; then
>>> +    if cat "${tmp_dl}" >"${tmp_output}"; then
>>>          mv "${tmp_output}" "${output}"
>>>          ret=0
>>>      fi
>>> diff --git a/support/download/svn b/support/download/svn
>>> index 39cbbcb..259d3ed 100755
>>> --- a/support/download/svn
>>> +++ b/support/download/svn
>>> @@ -33,7 +33,7 @@ rm -rf "${repodir}"
>>>  
>>>  ret=1
>>>  if ${SVN} export "${repo}@${rev}" "${repodir}"; then
>>> -    if tar czf "${tmp_output}" "${repodir}"; then
>>> +    if tar czf - "${repodir}" >"${tmp_output}"; then
>>
>>  Again, tar doesn't unlink so this isn't needed.
> 
> Ditto.
> 
> Regards,
> Yann E. MORIN.
> 


-- 
Arnout Vandecappelle                          arnout at mind be
Senior Embedded Software Architect            +32-16-286500
Essensium/Mind                                http://www.mind.be
G.Geenslaan 9, 3001 Leuven, Belgium           BE 872 984 063 RPR Leuven
LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle
GPG fingerprint:  7CB5 E4CC 6C2E EFD4 6E3D A754 F963 ECAB 2450 2F1F

  reply	other threads:[~2014-07-08 16:42 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-06 21:27 [Buildroot] [PATCH 0/5] Cleanups in download helpers (branch yem/check-downloads) Yann E. MORIN
2014-07-06 21:27 ` [Buildroot] [PATCH 1/5] support/download: fix the bzr helper Yann E. MORIN
2014-07-07  5:54   ` Arnout Vandecappelle
2014-07-06 21:27 ` [Buildroot] [PATCH 2/5] support/download: properly use temp files Yann E. MORIN
2014-07-07  6:11   ` Arnout Vandecappelle
2014-07-07 21:38     ` Yann E. MORIN
2014-07-08 16:42       ` Arnout Vandecappelle [this message]
2014-07-08 21:52         ` Yann E. MORIN
2014-07-09  7:45       ` Thomas Petazzoni
2014-07-10 15:59         ` Yann E. MORIN
2014-07-06 21:27 ` [Buildroot] [PATCH 3/5] support/download: simplify the local-files helper Yann E. MORIN
2014-07-08  8:49   ` Peter Korsgaard
2014-07-06 21:27 ` [Buildroot] [PATCH 4/5] support/download: only create final temp file when needed Yann E. MORIN
2014-07-07 16:08   ` Arnout Vandecappelle
2014-07-07 21:53     ` Yann E. MORIN
2014-07-08 15:53       ` Arnout Vandecappelle
2014-07-06 21:27 ` [Buildroot] [PATCH 5/5] support/download: rationalise naming and use of the temporary files Yann E. MORIN
2014-07-06 21:40   ` Yann E. MORIN

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53BC1F5D.2050407@mind.be \
    --to=arnout@mind.be \
    --cc=buildroot@busybox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox