* [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
@ 2026-02-13 12:46 George Hu
2026-02-13 15:36 ` Chris Torek
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: George Hu @ 2026-02-13 12:46 UTC (permalink / raw)
To: git; +Cc: George Hu, Junio C Hamano, Johannes Schindelin
The `sendfile()` system call copies data between one file descriptor
and another within the kernel, which is more efficient than the
combination of `read()` and `write()`.
Signed-off-by: George Hu <integral@archlinux.org>
---
copy.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/copy.c b/copy.c
index b668209b6c..d4b7cde764 100644
--- a/copy.c
+++ b/copy.c
@@ -7,8 +7,23 @@
#include "strbuf.h"
#include "abspath.h"
+#ifdef __linux__
+# include <sys/sendfile.h>
+#endif
+
int copy_fd(int ifd, int ofd)
{
+#ifdef __linux__
+ struct stat ifd_st;
+ size_t ifd_len;
+ ssize_t ret = 0;
+
+ fstat(ifd, &ifd_st);
+ ifd_len = ifd_st.st_size;
+
+ while (ifd_len && (ret = sendfile(ofd, ifd, NULL, ifd_len)) > 0)
+ ifd_len -= (size_t)ret;
+#else
while (1) {
char buffer[8192];
ssize_t len = xread(ifd, buffer, sizeof(buffer));
@@ -19,6 +34,8 @@ int copy_fd(int ifd, int ofd)
if (write_in_full(ofd, buffer, len) < 0)
return COPY_WRITE_ERROR;
}
+#endif
+
return 0;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
2026-02-13 12:46 [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux George Hu
@ 2026-02-13 15:36 ` Chris Torek
2026-02-14 9:21 ` George Hu
2026-02-14 16:43 ` Phillip Wood
2026-02-15 7:43 ` Jeff King
2 siblings, 1 reply; 9+ messages in thread
From: Chris Torek @ 2026-02-13 15:36 UTC (permalink / raw)
To: George Hu; +Cc: git, Junio C Hamano, Johannes Schindelin
On Fri, Feb 13, 2026 at 4:47 AM George Hu <integral@archlinux.org> wrote:
> The `sendfile()` system call copies data between one file descriptor
> and another within the kernel, which is more efficient than the
> combination of `read()` and `write()`.
sendfile() is found on other systems (notably BSDs), so perhaps ...
> Signed-off-by: George Hu <integral@archlinux.org>
> ---
> copy.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/copy.c b/copy.c
> index b668209b6c..d4b7cde764 100644
> --- a/copy.c
> +++ b/copy.c
> @@ -7,8 +7,23 @@
> #include "strbuf.h"
> #include "abspath.h"
>
> +#ifdef __linux__
... this and the subsequent ifdef should be based on the feature,
rather than the OS.
Chris
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
2026-02-13 15:36 ` Chris Torek
@ 2026-02-14 9:21 ` George Hu
2026-02-14 16:50 ` Chris Torek
0 siblings, 1 reply; 9+ messages in thread
From: George Hu @ 2026-02-14 9:21 UTC (permalink / raw)
To: Chris Torek; +Cc: git, Junio C Hamano, Johannes Schindelin
On 2/13/26 11:36 PM, Chris Torek wrote:
> On Fri, Feb 13, 2026 at 4:47 AM George Hu <integral@archlinux.org> wrote:
>> The `sendfile()` system call copies data between one file descriptor
>> and another within the kernel, which is more efficient than the
>> combination of `read()` and `write()`.
> sendfile() is found on other systems (notably BSDs), so perhaps ...
>
>> Signed-off-by: George Hu <integral@archlinux.org>
>> ---
>> copy.c | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>>
>> diff --git a/copy.c b/copy.c
>> index b668209b6c..d4b7cde764 100644
>> --- a/copy.c
>> +++ b/copy.c
>> @@ -7,8 +7,23 @@
>> #include "strbuf.h"
>> #include "abspath.h"
>>
>> +#ifdef __linux__
> ... this and the subsequent ifdef should be based on the feature,
> rather than the OS.
>
> Chris
Hello,
Although the `sendfile()` system call exists in both Linux and BSDs,
their semantics and APIs differ.
The Linux prototype of `sendfile()` is:
ssize_t sendfile(int out_fd, int in_fd, off_t *_Nullable offset, size_t
count);
While FreeBSD exposes:
int sendfile(int fd, int s, off_t offset, size_t nbytes, struct sf_hdtr
*hdtr, off_t *sbytes, int flags);
George
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
2026-02-13 12:46 [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux George Hu
2026-02-13 15:36 ` Chris Torek
@ 2026-02-14 16:43 ` Phillip Wood
2026-02-15 6:23 ` George Hu
2026-02-15 7:43 ` Jeff King
2 siblings, 1 reply; 9+ messages in thread
From: Phillip Wood @ 2026-02-14 16:43 UTC (permalink / raw)
To: George Hu, git; +Cc: Junio C Hamano, Johannes Schindelin
On 13/02/2026 12:46, George Hu wrote:
> The `sendfile()` system call copies data between one file descriptor
> and another within the kernel, which is more efficient than the
> combination of `read()` and `write()`.
Does git copy any files big enough that this makes a noticeable difference?
> int copy_fd(int ifd, int ofd)
> {
> +#ifdef __linux__
Our normal practice when a function has platform specific
implementations is to host those implementations under compat/<platform>
(see the implementations of trace2_collect_process_information() for an
example)
> + struct stat ifd_st;
> + size_t ifd_len;
> + ssize_t ret = 0;
> +
> + fstat(ifd, &ifd_st);
What happens if fstat() fails?
> + ifd_len = ifd_st.st_size;
> +
> + while (ifd_len && (ret = sendfile(ofd, ifd, NULL, ifd_len)) > 0)
> + ifd_len -= (size_t)ret;
This does not propagate errors to the caller, if sendfile() fails the
function returns 0. write_in_full() handles non-blocking writes, we
should do the same here if we see EAGAIN. The man page lists various
restrictions on the file descriptors passed to sendfile() - I'm not sure
that they affect the uses of copy_file() in git but to be safe we should
fall back to the read()/write() loop if we see EINVAL.
Thanks
Phillip
> +#else
> while (1) {
> char buffer[8192];
> ssize_t len = xread(ifd, buffer, sizeof(buffer));
> @@ -19,6 +34,8 @@ int copy_fd(int ifd, int ofd)
> if (write_in_full(ofd, buffer, len) < 0)
> return COPY_WRITE_ERROR;
> }
> +#endif
> +
> return 0;
> }
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
2026-02-14 9:21 ` George Hu
@ 2026-02-14 16:50 ` Chris Torek
2026-02-20 16:35 ` Ed Maste
0 siblings, 1 reply; 9+ messages in thread
From: Chris Torek @ 2026-02-14 16:50 UTC (permalink / raw)
To: George Hu; +Cc: git, Junio C Hamano, Johannes Schindelin
Ah, more importantly, FreeBSD's sendfile only operates on sockets.
Both systems also need fallback code for un-handled cases.
Chris
On Sat, Feb 14, 2026 at 1:22 AM George Hu <integral@archlinux.org> wrote:
>
> On 2/13/26 11:36 PM, Chris Torek wrote:
>
> > On Fri, Feb 13, 2026 at 4:47 AM George Hu <integral@archlinux.org> wrote:
> >> The `sendfile()` system call copies data between one file descriptor
> >> and another within the kernel, which is more efficient than the
> >> combination of `read()` and `write()`.
> > sendfile() is found on other systems (notably BSDs), so perhaps ...
> >
> >> Signed-off-by: George Hu <integral@archlinux.org>
> >> ---
> >> copy.c | 17 +++++++++++++++++
> >> 1 file changed, 17 insertions(+)
> >>
> >> diff --git a/copy.c b/copy.c
> >> index b668209b6c..d4b7cde764 100644
> >> --- a/copy.c
> >> +++ b/copy.c
> >> @@ -7,8 +7,23 @@
> >> #include "strbuf.h"
> >> #include "abspath.h"
> >>
> >> +#ifdef __linux__
> > ... this and the subsequent ifdef should be based on the feature,
> > rather than the OS.
> >
> > Chris
>
> Hello,
>
> Although the `sendfile()` system call exists in both Linux and BSDs,
> their semantics and APIs differ.
> The Linux prototype of `sendfile()` is:
>
> ssize_t sendfile(int out_fd, int in_fd, off_t *_Nullable offset, size_t
> count);
>
> While FreeBSD exposes:
>
> int sendfile(int fd, int s, off_t offset, size_t nbytes, struct sf_hdtr
> *hdtr, off_t *sbytes, int flags);
>
> George
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
2026-02-14 16:43 ` Phillip Wood
@ 2026-02-15 6:23 ` George Hu
0 siblings, 0 replies; 9+ messages in thread
From: George Hu @ 2026-02-15 6:23 UTC (permalink / raw)
To: phillip.wood, git; +Cc: Junio C Hamano, Johannes Schindelin
On 2/15/26 12:43 AM, Phillip Wood wrote:
> On 13/02/2026 12:46, George Hu wrote:
>> The `sendfile()` system call copies data between one file descriptor
>> and another within the kernel, which is more efficient than the
>> combination of `read()` and `write()`.
>
> Does git copy any files big enough that this makes a noticeable
> difference?
>
>> int copy_fd(int ifd, int ofd)
>> {
>> +#ifdef __linux__
>
> Our normal practice when a function has platform specific
> implementations is to host those implementations under compat/<platform>
> (see the implementations of trace2_collect_process_information() for
> an example)
>
The Linux implementation of `trace2_collect_process_information()`
resides in compat/linux with a stub version in compat/stub. After moving
the Linux-specifc `copy_fd()` implementation into compat/linux, where
should the generic implementation be placed?
>> + struct stat ifd_st;
>> + size_t ifd_len;
>> + ssize_t ret = 0;
>> +
>> + fstat(ifd, &ifd_st);
>
> What happens if fstat() fails?
>
>> + ifd_len = ifd_st.st_size;
>> +
>> + while (ifd_len && (ret = sendfile(ofd, ifd, NULL, ifd_len)) > 0)
>> + ifd_len -= (size_t)ret;
>
> This does not propagate errors to the caller, if sendfile() fails the
> function returns 0. write_in_full() handles non-blocking writes, we
> should do the same here if we see EAGAIN. The man page lists various
> restrictions on the file descriptors passed to sendfile() - I'm not
> sure that they affect the uses of copy_file() in git but to be safe we
> should fall back to the read()/write() loop if we see EINVAL.
>
According to the manual, `sendfile()` returns -1 on failure; a return
value of 0 indicates EOF.
There are error cases besides EAGAIN and EINVAL. Maybe we should fall
back to the read() / write() loop for errors other than EAGAIN?
Sincerely,
George
> Thanks
>
> Phillip
>
>> +#else
>> while (1) {
>> char buffer[8192];
>> ssize_t len = xread(ifd, buffer, sizeof(buffer));
>> @@ -19,6 +34,8 @@ int copy_fd(int ifd, int ofd)
>> if (write_in_full(ofd, buffer, len) < 0)
>> return COPY_WRITE_ERROR;
>> }
>> +#endif
>> +
>> return 0;
>> }
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
2026-02-13 12:46 [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux George Hu
2026-02-13 15:36 ` Chris Torek
2026-02-14 16:43 ` Phillip Wood
@ 2026-02-15 7:43 ` Jeff King
2 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2026-02-15 7:43 UTC (permalink / raw)
To: George Hu; +Cc: git, Junio C Hamano, Johannes Schindelin
On Fri, Feb 13, 2026 at 08:46:56PM +0800, George Hu wrote:
> The `sendfile()` system call copies data between one file descriptor
> and another within the kernel, which is more efficient than the
> combination of `read()` and `write()`.
OK, but...does this efficiency matter for the callers of copy_file() and
friends? Just skimming over grep results, it mostly seems to be used
with files we'd expect to be small. Some possible exceptions I can see:
- bundle-uri with a file:// uri will use it (so this could be a big
packfile)
- in "clone --local" mode without hardlinks available, we might copy a
packfile
To some degree, if it's easy to use sendfile(), we should just do so if
it might be a bit faster. But as responses from others showed, there are
some complications we'd have to deal with (portability, fallback, etc).
So I think it would be a lot more compelling if we could show a
measurable speedup for some real world operation.
-Peff
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
2026-02-14 16:50 ` Chris Torek
@ 2026-02-20 16:35 ` Ed Maste
2026-02-20 16:48 ` Collin Funk
0 siblings, 1 reply; 9+ messages in thread
From: Ed Maste @ 2026-02-20 16:35 UTC (permalink / raw)
To: Chris Torek; +Cc: George Hu, git, Junio C Hamano, Johannes Schindelin
On Sat, 14 Feb 2026 at 11:50, Chris Torek <chris.torek@gmail.com> wrote:
>
> Ah, more importantly, FreeBSD's sendfile only operates on sockets.
True. If benchmarking shows this is profitable then we'd want to use
copy_file_range(2) on FreeBSD.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux
2026-02-20 16:35 ` Ed Maste
@ 2026-02-20 16:48 ` Collin Funk
0 siblings, 0 replies; 9+ messages in thread
From: Collin Funk @ 2026-02-20 16:48 UTC (permalink / raw)
To: Ed Maste; +Cc: Chris Torek, George Hu, git, Junio C Hamano, Johannes Schindelin
Ed Maste <emaste@freebsd.org> writes:
> On Sat, 14 Feb 2026 at 11:50, Chris Torek <chris.torek@gmail.com> wrote:
>>
>> Ah, more importantly, FreeBSD's sendfile only operates on sockets.
>
> True. If benchmarking shows this is profitable then we'd want to use
> copy_file_range(2) on FreeBSD.
We use copy_file_range in GNU Coreutils. Note that it has quite a few
issues [1], including a recent one affecting files larger than INT_MAX
bytes [2][3].
Coreutils has Gnulib to work around this stuff, and the performance
improvement is meaningful for 'cp'. Based on earlier messages in this
thread, I am not sure if it is worth dealing with in this case.
Collin
[1] https://lwn.net/Articles/789527/
[2] https://sourceware.org/PR33245
[3] https://bugs.gnu.org/79139
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-02-20 16:48 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-13 12:46 [PATCH] copy.c: use `sendfile()` for in-kernel file copying on Linux George Hu
2026-02-13 15:36 ` Chris Torek
2026-02-14 9:21 ` George Hu
2026-02-14 16:50 ` Chris Torek
2026-02-20 16:35 ` Ed Maste
2026-02-20 16:48 ` Collin Funk
2026-02-14 16:43 ` Phillip Wood
2026-02-15 6:23 ` George Hu
2026-02-15 7:43 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox