public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
@ 2026-02-13 12:46 George Hu
  2026-02-13 15:36 ` Chris Torek
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: George Hu @ 2026-02-13 12:46 UTC (permalink / raw)
  To: git; +Cc: George Hu, Junio C Hamano, Johannes Schindelin

The `sendfile()` system call copies data between one file descriptor
and another within the kernel, which is more efficient than the
combination of `read()` and `write()`.

Signed-off-by: George Hu <integral@archlinux.org>
---
 copy.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/copy.c b/copy.c
index b668209b6c..d4b7cde764 100644
--- a/copy.c
+++ b/copy.c
@@ -7,8 +7,23 @@
 #include "strbuf.h"
 #include "abspath.h"
 
+#ifdef __linux__
+# include <sys/sendfile.h>
+#endif
+
 int copy_fd(int ifd, int ofd)
 {
+#ifdef __linux__
+	struct stat ifd_st;
+	size_t ifd_len;
+	ssize_t ret = 0;
+
+	fstat(ifd, &ifd_st);
+	ifd_len = ifd_st.st_size;
+
+	while (ifd_len && (ret = sendfile(ofd, ifd, NULL, ifd_len)) > 0)
+		ifd_len -= (size_t)ret;
+#else
 	while (1) {
 		char buffer[8192];
 		ssize_t len = xread(ifd, buffer, sizeof(buffer));
@@ -19,6 +34,8 @@ int copy_fd(int ifd, int ofd)
 		if (write_in_full(ofd, buffer, len) < 0)
 			return COPY_WRITE_ERROR;
 	}
+#endif
+
 	return 0;
 }
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
  2026-02-13 12:46 [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux George Hu
@ 2026-02-13 15:36 ` Chris Torek
  2026-02-14  9:21   ` George Hu
  2026-02-14 16:43 ` Phillip Wood
  2026-02-15  7:43 ` Jeff King
  2 siblings, 1 reply; 9+ messages in thread
From: Chris Torek @ 2026-02-13 15:36 UTC (permalink / raw)
  To: George Hu; +Cc: git, Junio C Hamano, Johannes Schindelin

On Fri, Feb 13, 2026 at 4:47 AM George Hu <integral@archlinux.org> wrote:
> The `sendfile()` system call copies data between one file descriptor
> and another within the kernel, which is more efficient than the
> combination of `read()` and `write()`.

sendfile() is found on other systems (notably BSDs), so perhaps ...

> Signed-off-by: George Hu <integral@archlinux.org>
> ---
>  copy.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/copy.c b/copy.c
> index b668209b6c..d4b7cde764 100644
> --- a/copy.c
> +++ b/copy.c
> @@ -7,8 +7,23 @@
>  #include "strbuf.h"
>  #include "abspath.h"
>
> +#ifdef __linux__

... this and the subsequent ifdef should be based on the feature,
rather than the OS.

Chris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
  2026-02-13 15:36 ` Chris Torek
@ 2026-02-14  9:21   ` George Hu
  2026-02-14 16:50     ` Chris Torek
  0 siblings, 1 reply; 9+ messages in thread
From: George Hu @ 2026-02-14  9:21 UTC (permalink / raw)
  To: Chris Torek; +Cc: git, Junio C Hamano, Johannes Schindelin

On 2/13/26 11:36 PM, Chris Torek wrote:

> On Fri, Feb 13, 2026 at 4:47 AM George Hu <integral@archlinux.org> wrote:
>> The `sendfile()` system call copies data between one file descriptor
>> and another within the kernel, which is more efficient than the
>> combination of `read()` and `write()`.
> sendfile() is found on other systems (notably BSDs), so perhaps ...
>
>> Signed-off-by: George Hu <integral@archlinux.org>
>> ---
>>   copy.c | 17 +++++++++++++++++
>>   1 file changed, 17 insertions(+)
>>
>> diff --git a/copy.c b/copy.c
>> index b668209b6c..d4b7cde764 100644
>> --- a/copy.c
>> +++ b/copy.c
>> @@ -7,8 +7,23 @@
>>   #include "strbuf.h"
>>   #include "abspath.h"
>>
>> +#ifdef __linux__
> ... this and the subsequent ifdef should be based on the feature,
> rather than the OS.
>
> Chris

Hello,

Although the `sendfile()` system call exists in both Linux and BSDs, 
their semantics and APIs differ.
The Linux prototype of `sendfile()` is:

ssize_t sendfile(int out_fd, int in_fd, off_t *_Nullable offset, size_t 
count);

While FreeBSD exposes:

int sendfile(int fd, int s, off_t offset, size_t nbytes, struct sf_hdtr 
*hdtr, off_t *sbytes, int flags);

George

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
  2026-02-13 12:46 [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux George Hu
  2026-02-13 15:36 ` Chris Torek
@ 2026-02-14 16:43 ` Phillip Wood
  2026-02-15  6:23   ` George Hu
  2026-02-15  7:43 ` Jeff King
  2 siblings, 1 reply; 9+ messages in thread
From: Phillip Wood @ 2026-02-14 16:43 UTC (permalink / raw)
  To: George Hu, git; +Cc: Junio C Hamano, Johannes Schindelin

On 13/02/2026 12:46, George Hu wrote:
> The `sendfile()` system call copies data between one file descriptor
> and another within the kernel, which is more efficient than the
> combination of `read()` and `write()`.

Does git copy any files big enough that this makes a noticeable difference?

>   int copy_fd(int ifd, int ofd)
>   {
> +#ifdef __linux__

Our normal practice when a function has platform specific 
implementations is to host those implementations under compat/<platform>
(see the implementations of trace2_collect_process_information() for an 
example)

> +	struct stat ifd_st;
> +	size_t ifd_len;
> +	ssize_t ret = 0;
> +
> +	fstat(ifd, &ifd_st);

What happens if fstat() fails?

> +	ifd_len = ifd_st.st_size;
> +
> +	while (ifd_len && (ret = sendfile(ofd, ifd, NULL, ifd_len)) > 0)
> +		ifd_len -= (size_t)ret;

This does not propagate errors to the caller, if sendfile() fails the 
function returns 0. write_in_full() handles non-blocking writes, we 
should do the same here if we see EAGAIN. The man page lists various 
restrictions on the file descriptors passed to sendfile() - I'm not sure 
that they affect the uses of copy_file() in git but to be safe we should 
fall back to the read()/write() loop if we see EINVAL.

Thanks

Phillip

> +#else
>   	while (1) {
>   		char buffer[8192];
>   		ssize_t len = xread(ifd, buffer, sizeof(buffer));
> @@ -19,6 +34,8 @@ int copy_fd(int ifd, int ofd)
>   		if (write_in_full(ofd, buffer, len) < 0)
>   			return COPY_WRITE_ERROR;
>   	}
> +#endif
> +
>   	return 0;
>   }
>   


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
  2026-02-14  9:21   ` George Hu
@ 2026-02-14 16:50     ` Chris Torek
  2026-02-20 16:35       ` Ed Maste
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Torek @ 2026-02-14 16:50 UTC (permalink / raw)
  To: George Hu; +Cc: git, Junio C Hamano, Johannes Schindelin

Ah, more importantly, FreeBSD's sendfile only operates on sockets.

Both systems also need fallback code for un-handled cases.

Chris

On Sat, Feb 14, 2026 at 1:22 AM George Hu <integral@archlinux.org> wrote:
>
> On 2/13/26 11:36 PM, Chris Torek wrote:
>
> > On Fri, Feb 13, 2026 at 4:47 AM George Hu <integral@archlinux.org> wrote:
> >> The `sendfile()` system call copies data between one file descriptor
> >> and another within the kernel, which is more efficient than the
> >> combination of `read()` and `write()`.
> > sendfile() is found on other systems (notably BSDs), so perhaps ...
> >
> >> Signed-off-by: George Hu <integral@archlinux.org>
> >> ---
> >>   copy.c | 17 +++++++++++++++++
> >>   1 file changed, 17 insertions(+)
> >>
> >> diff --git a/copy.c b/copy.c
> >> index b668209b6c..d4b7cde764 100644
> >> --- a/copy.c
> >> +++ b/copy.c
> >> @@ -7,8 +7,23 @@
> >>   #include "strbuf.h"
> >>   #include "abspath.h"
> >>
> >> +#ifdef __linux__
> > ... this and the subsequent ifdef should be based on the feature,
> > rather than the OS.
> >
> > Chris
>
> Hello,
>
> Although the `sendfile()` system call exists in both Linux and BSDs,
> their semantics and APIs differ.
> The Linux prototype of `sendfile()` is:
>
> ssize_t sendfile(int out_fd, int in_fd, off_t *_Nullable offset, size_t
> count);
>
> While FreeBSD exposes:
>
> int sendfile(int fd, int s, off_t offset, size_t nbytes, struct sf_hdtr
> *hdtr, off_t *sbytes, int flags);
>
> George

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
  2026-02-14 16:43 ` Phillip Wood
@ 2026-02-15  6:23   ` George Hu
  0 siblings, 0 replies; 9+ messages in thread
From: George Hu @ 2026-02-15  6:23 UTC (permalink / raw)
  To: phillip.wood, git; +Cc: Junio C Hamano, Johannes Schindelin

On 2/15/26 12:43 AM, Phillip Wood wrote:

> On 13/02/2026 12:46, George Hu wrote:
>> The `sendfile()` system call copies data between one file descriptor
>> and another within the kernel, which is more efficient than the
>> combination of `read()` and `write()`.
>
> Does git copy any files big enough that this makes a noticeable 
> difference?
>
>>   int copy_fd(int ifd, int ofd)
>>   {
>> +#ifdef __linux__
>
> Our normal practice when a function has platform specific 
> implementations is to host those implementations under compat/<platform>
> (see the implementations of trace2_collect_process_information() for 
> an example)
>

The Linux implementation of `trace2_collect_process_information()` 
resides in compat/linux with a stub version in compat/stub. After moving 
the Linux-specifc `copy_fd()` implementation into compat/linux, where 
should the generic implementation be placed?

>> +    struct stat ifd_st;
>> +    size_t ifd_len;
>> +    ssize_t ret = 0;
>> +
>> +    fstat(ifd, &ifd_st);
>
> What happens if fstat() fails?
>
>> +    ifd_len = ifd_st.st_size;
>> +
>> +    while (ifd_len && (ret = sendfile(ofd, ifd, NULL, ifd_len)) > 0)
>> +        ifd_len -= (size_t)ret;
>
> This does not propagate errors to the caller, if sendfile() fails the 
> function returns 0. write_in_full() handles non-blocking writes, we 
> should do the same here if we see EAGAIN. The man page lists various 
> restrictions on the file descriptors passed to sendfile() - I'm not 
> sure that they affect the uses of copy_file() in git but to be safe we 
> should fall back to the read()/write() loop if we see EINVAL.
>

According to the manual, `sendfile()` returns -1 on failure; a return 
value of 0 indicates EOF.

There are error cases besides EAGAIN and EINVAL. Maybe we should fall 
back to the read() / write() loop for errors other than EAGAIN?

Sincerely,
George

> Thanks
>
> Phillip
>
>> +#else
>>       while (1) {
>>           char buffer[8192];
>>           ssize_t len = xread(ifd, buffer, sizeof(buffer));
>> @@ -19,6 +34,8 @@ int copy_fd(int ifd, int ofd)
>>           if (write_in_full(ofd, buffer, len) < 0)
>>               return COPY_WRITE_ERROR;
>>       }
>> +#endif
>> +
>>       return 0;
>>   }
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
  2026-02-13 12:46 [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux George Hu
  2026-02-13 15:36 ` Chris Torek
  2026-02-14 16:43 ` Phillip Wood
@ 2026-02-15  7:43 ` Jeff King
  2 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2026-02-15  7:43 UTC (permalink / raw)
  To: George Hu; +Cc: git, Junio C Hamano, Johannes Schindelin

On Fri, Feb 13, 2026 at 08:46:56PM +0800, George Hu wrote:

> The `sendfile()` system call copies data between one file descriptor
> and another within the kernel, which is more efficient than the
> combination of `read()` and `write()`.

OK, but...does this efficiency matter for the callers of copy_file() and
friends? Just skimming over grep results, it mostly seems to be used
with files we'd expect to be small. Some possible exceptions I can see:

  - bundle-uri with a file:// uri will use it (so this could be a big
    packfile)

  - in "clone --local" mode without hardlinks available, we might copy a
    packfile

To some degree, if it's easy to use sendfile(), we should just do so if
it might be a bit faster. But as responses from others showed, there are
some complications we'd have to deal with (portability, fallback, etc).
So I think it would be a lot more compelling if we could show a
measurable speedup for some real world operation.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
  2026-02-14 16:50     ` Chris Torek
@ 2026-02-20 16:35       ` Ed Maste
  2026-02-20 16:48         ` Collin Funk
  0 siblings, 1 reply; 9+ messages in thread
From: Ed Maste @ 2026-02-20 16:35 UTC (permalink / raw)
  To: Chris Torek; +Cc: George Hu, git, Junio C Hamano, Johannes Schindelin

On Sat, 14 Feb 2026 at 11:50, Chris Torek <chris.torek@gmail.com> wrote:
>
> Ah, more importantly, FreeBSD's sendfile only operates on sockets.

True. If benchmarking shows this is profitable then we'd want to use
copy_file_range(2) on FreeBSD.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
  2026-02-20 16:35       ` Ed Maste
@ 2026-02-20 16:48         ` Collin Funk
  0 siblings, 0 replies; 9+ messages in thread
From: Collin Funk @ 2026-02-20 16:48 UTC (permalink / raw)
  To: Ed Maste; +Cc: Chris Torek, George Hu, git, Junio C Hamano, Johannes Schindelin

Ed Maste <emaste@freebsd.org> writes:

> On Sat, 14 Feb 2026 at 11:50, Chris Torek <chris.torek@gmail.com> wrote:
>>
>> Ah, more importantly, FreeBSD's sendfile only operates on sockets.
>
> True. If benchmarking shows this is profitable then we'd want to use
> copy_file_range(2) on FreeBSD.

We use copy_file_range in GNU Coreutils. Note that it has quite a few
issues [1], including a recent one affecting files larger than INT_MAX
bytes [2][3].

Coreutils has Gnulib to work around this stuff, and the performance
improvement is meaningful for 'cp'. Based on earlier messages in this
thread, I am not sure if it is worth dealing with in this case.

Collin

[1] https://lwn.net/Articles/789527/
[2] https://sourceware.org/PR33245
[3] https://bugs.gnu.org/79139

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-02-20 16:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-13 12:46 [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux George Hu
2026-02-13 15:36 ` Chris Torek
2026-02-14  9:21   ` George Hu
2026-02-14 16:50     ` Chris Torek
2026-02-20 16:35       ` Ed Maste
2026-02-20 16:48         ` Collin Funk
2026-02-14 16:43 ` Phillip Wood
2026-02-15  6:23   ` George Hu
2026-02-15  7:43 ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox