open(2) vs fopen(3)

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* open(2) vs fopen(3)
@ 2006-09-14  9:15 moreau francis
  2006-09-14 10:52 ` Andy Whitcroft
  2006-09-14 15:46 ` Linus Torvalds
  0 siblings, 2 replies; 5+ messages in thread
From: moreau francis @ 2006-09-14  9:15 UTC (permalink / raw)
  To: git

Hi GIT folks,

I'm reading git source code and falling on this stupid question:
Why sometime open(2) is used and other time fopen(3) is
prefered. I'm sorry for this dump question but I have no clue.

thanks

Francis

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: open(2) vs fopen(3)
  2006-09-14  9:15 open(2) vs fopen(3) moreau francis
@ 2006-09-14 10:52 ` Andy Whitcroft
  2006-09-14 15:46 ` Linus Torvalds
  1 sibling, 0 replies; 5+ messages in thread
From: Andy Whitcroft @ 2006-09-14 10:52 UTC (permalink / raw)
  To: moreau francis; +Cc: git

moreau francis wrote:
> Hi GIT folks,
> 
> I'm reading git source code and falling on this stupid question:
> Why sometime open(2) is used and other time fopen(3) is
> prefered. I'm sorry for this dump question but I have no clue.

It looks very much from a quick random sampling, that open is used where
we are going to mmap the file for quick access.  fopen otherwise.

-apw

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: open(2) vs fopen(3)
  2006-09-14  9:15 open(2) vs fopen(3) moreau francis
  2006-09-14 10:52 ` Andy Whitcroft
@ 2006-09-14 15:46 ` Linus Torvalds
  2006-09-14 16:37   ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2006-09-14 15:46 UTC (permalink / raw)
  To: moreau francis; +Cc: git

On Thu, 14 Sep 2006, moreau francis wrote:
> 
> I'm reading git source code and falling on this stupid question:
> Why sometime open(2) is used and other time fopen(3) is
> prefered. I'm sorry for this dump question but I have no clue.

fopen() tends to result in easier usage, especially if the file in 
question is a line-based ASCII file, and you can just use "fgets()" to 
read it. So fopen is the simple alternative for simple problems.

Using a direct open() means that you have to use the low-level IO 
functions (I'm ignoring the use of "fdopen()"), but if done right, it has 
a number of advantages:

 - with the proper use, it's potentially more efficient (but stdio is a 
   lot more efficient if you do lots of small writes without buffering)

 - you can control the creation flags better (ie if you want to do an 
   exclusive open, you _have_ to use "open()" - there's no portable way to 
   say O_EXCL with "fopen()")

 - error conditions are a lot more obvious and repeatable with the 
   low-level things, at least so I find personally. Error handling with 
   stdio routines is _possible_, but probably because almost nobody ever 
   does it, it's not something that people are conditioned to do, so it 
   ends up beign "strange".

   (So this third one is more a psychological issue than a really 
   technical issue - at least for me. I'd not use stdio for things I 
   might expect to do fsync() on, for example. It's _possible_, but very 
   non-intuitive, because that's now how people generally use stdio).

So it boils down to the fact that people tend to do higher-level things 
with stdio interfaces (fopen and friends), and lower-level things with the 
raw system call ("unistd.h") interfaces.

In git, you'd expect to see code that actually works on the object 
database or the refs using "open()" (both because it's low-level, and it 
generally wants to use O_EXCL and friends), and then things that open the 
".gitignore" file to use fopen() (because it's a line-based ASCII 
interface, and it's not an "important" file in the sense that we don't 
really care about some strange situation where it could give us an IO 
error).

There might also be a difference in personality. I probably tend to use 
the core unistd interfaces more than some other people would, and some 
other people might end up using stdio for pretty much everything.

			Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: open(2) vs fopen(3)
  2006-09-14 15:46 ` Linus Torvalds
@ 2006-09-14 16:37   ` Junio C Hamano
  2006-09-14 17:31     ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2006-09-14 16:37 UTC (permalink / raw)
  To: moreau francis; +Cc: git, Linus Torvalds

Linus Torvalds <torvalds@osdl.org> writes:

> On Thu, 14 Sep 2006, moreau francis wrote:
>> 
>> I'm reading git source code and falling on this stupid question:
>> Why sometime open(2) is used and other time fopen(3) is
>> prefered. I'm sorry for this dump question but I have no clue.
>
> fopen() tends to result in easier usage, especially if the file in 
> question is a line-based ASCII file, and you can just use "fgets()" to 
> read it. So fopen is the simple alternative for simple problems.
>
> Using a direct open() means that you have to use the low-level IO 
> functions (I'm ignoring the use of "fdopen()"), but if done right, it has 
> a number of advantages:
>...
>  - error conditions are a lot more obvious and repeatable with the 
>    low-level things, at least so I find personally. Error handling with 
>    stdio routines is _possible_, but probably because almost nobody ever 
>    does it, it's not something that people are conditioned to do, so it 
>    ends up beign "strange".

Another issue related with this is that stdio implementations
tend to have unintuitive interaction with signals, one fine
example of it being the problem we fixed with commit fb7a653,
where on Solaris fgets(3) did not restart the underlying read(2)
upon SIGALRM.

Technically it was a bug on our part not Solaris, but that was
something unexpected to see.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: open(2) vs fopen(3)
  2006-09-14 16:37   ` Junio C Hamano
@ 2006-09-14 17:31     ` Linus Torvalds
  0 siblings, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2006-09-14 17:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: moreau francis, git

On Thu, 14 Sep 2006, Junio C Hamano wrote:
>
> Another issue related with this is that stdio implementations
> tend to have unintuitive interaction with signals, one fine
> example of it being the problem we fixed with commit fb7a653,
> where on Solaris fgets(3) did not restart the underlying read(2)
> upon SIGALRM.

Yeah. However, I think it's worth just posting the code in question to 
explain _why_ error handling with stdio sucks so badly, and why nobody 
does it..

Here's the snippet:

                if (!fgets(line, sizeof(line), stdin)) {
                        if (feof(stdin))
                                break;
                        if (!ferror(stdin))
                                die("fgets returned NULL, not EOF, not error!");
                        if (errno != EINTR)
                                die("fgets: %s", strerror(errno));
                        clearerr(stdin);

so with the <stdio.h> functions, you have to check FOUR DIFFERENT THINGS 
(1: return value, 2: feof() value, 3: ferror() value, and 4: errno) to get 
things right, and to add insult to injury, you then have to do an explicit 
clear.

In other words, the fundamental reason nobody bothers checking errors with 
stdio is that stdio just makes it a damn pain in the ass to do so - by 
having a million different thing you have to do (and ordering actually 
matters).

In contrast, the <unistd.h> interfaces are a paragon of clarity: you check 
just two things - the return value, and possibly "errno".

Now, <unistd.h> isn't perfect either, and in the kernel we have simplified 
things further, by getting rid of "errno", and just having the return 
value contain errno too. Making things not only trivially thread-safe, but 
also actually easier to code and understand, because you don't have 
anything to be confused about: the return value is always the only thing 
you need to look at in order to know what went wrong.

But unistd.h sure is a lot better than stdio in this area. Of course, 
stdio.h is just a lot easier to use when you don't actually care about the 
errors, which is also partly the _reason_ why caring about errors is so 
hard (the whole separate clearerr() and ferror() interfaces exist exactly 
_because_ people don't care about errors in many cases, and you're 
supposed to maybe have some way to test at the end whether an error 
happened or not).

So stdio.h is pretty much geared towards delayed error handling, which in 
practice ends up often meaning "no error handling at all".

			Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-09-14 17:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-14  9:15 open(2) vs fopen(3) moreau francis
2006-09-14 10:52 ` Andy Whitcroft
2006-09-14 15:46 ` Linus Torvalds
2006-09-14 16:37   ` Junio C Hamano
2006-09-14 17:31     ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).