From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ej1-f68.google.com (mail-ej1-f68.google.com [209.85.218.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4BDB22BB1D for ; Wed, 28 Jan 2026 11:15:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.68 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769598928; cv=none; b=Ce+yZfkl+XyCOc2C7kWjRGs3XyioJxpG5QuH2QJ5p1GSGyHcw6pUUtD4QIYKp9y73raodibuiBMwx35Sz9g25io7ezKmYro8LFwinzHYDg5R++i68EWGQdLKM8huqyclTx/ZWFcsuAt7bmDj9Vku/56Obp4WOCUXmhDIjKm12cM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769598928; c=relaxed/simple; bh=NL/DT5lFVHmc4AILTs8xEI9v/ziaQ6ccwKBLIe+LOZU=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=MElTTWsMeFIwkBYF8MwX03nU/xZca2kKpE1BJQwHH9AZUfmRFiZmlW+ydWd7OldygPyajT7OIArDqn1Mt+dCCL3RcKQ/7b3gVtiwmxDQeK9ibRaoWGpzI9cPGaiuW0Rm2TBscVwBXywqgRmNXtpR+fJyOhrVR6rNsv7FZV0mbDA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=V3KpZSO5; arc=none smtp.client-ip=209.85.218.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="V3KpZSO5" Received: by mail-ej1-f68.google.com with SMTP id a640c23a62f3a-b872f1c31f1so920625266b.0 for ; Wed, 28 Jan 2026 03:15:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769598925; x=1770203725; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:reply-to:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=EqFqSwKYc5TLYLGEBKfPN0OSPOSjNF/4qqN/6JkYfM4=; b=V3KpZSO5NZLhf4ZJAx7nLECX6rNNFM4hFMAsLCyix+zaJ0IsOM5fjUZEgycgPCCjK8 MSe4WtjBwDP+6z3XqzTrh1AExnYIwIuvuEZsLPlUfI+4R43F+pAOzX/wXPGh8RDmenFM FA05SFA4yjg5a5vFAwcYzEqPbac22svXPRH8zTNTDABtX5KRgXGRTHPrnpA1CHvGYiFx aIbOHz/kLKBBaG0CRtk01v3dgiYaiZTJLQYAVZ1hSnDYKxkuKlDw5WoGVTMUTR9ZOkts 369UTpg5je83cIZa5VW/inP39G7PstE8FjiB7rwdzySzMC3ff0UFkqTdLPJF7g6RH5+d c1yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769598925; x=1770203725; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:reply-to:user-agent:mime-version:date :message-id:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=EqFqSwKYc5TLYLGEBKfPN0OSPOSjNF/4qqN/6JkYfM4=; b=wxg0aL0ym57PU3+dndgF1JTol1zO9P6I2vA86CSf43P5WKoUdBL3Qo2VY4opolw4fI SKLn4BGFhQpyJpOKgSzVVfBtoP5YGOsi1XoM+yTqqB9H7QMhFC+BsXQcYRshu+BIR/sP xNwHhmgX2lox9qfmMK/GMoTzEUFAXmoFgEcG6Dlyw/HDLe7JnpEcHVUusp+lv6nyzNIN D/VNgingWJMcBT4jsL8scxT8wpS49B6HwlRHGYXK6jmp8GkfMT3O/wEQ2hIAhJnnvQ05 qHdFSCFp04Fv75FNyYba6D+LeOkUxuMGOBtJLvMKp98XVS7LTs9F5BUcWNlmTOqBzvbp jUZQ== X-Forwarded-Encrypted: i=1; AJvYcCVBeTkC5r5y1hksyGebIR1CjkUsPdKFxJ6x7ZGmGhG+KNQxw9xzT4UwQpi5sbJxX1PxXtQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yyjg270JtQL1bg9xt62I0yy8gmXtqpu+3ckaxTb0kBa3JxAG8hk 76qb8dsubwNw9vaNWqmbAagI3VVual1xQC7hixLQ4sotPCpjfkZ78kiR X-Gm-Gg: AZuq6aJScywD7DFrKHXWCFU/Z5tcm7tmcozTRkJyblvG67g6Idw+TyrY05yUwhD4qMQ dd50aMunzyPY5WTCqdpOCXnku+Bbefje4pi6R/mUJdwCniq9cA4e8qG0nHMfEzhwxtI/l3PFTkf 4LxGeXlmsX9M680wRwMmf58W6bpCmkK+2c25s4m44rJWkLwLTrkThZbBcxdiob5uhPvrZUz3PAN lqdWRNLIAoI5PkZ++DBanDw6qzxOiAvtISZvb9gRAAzpXFgd+EFxX0ufskpVwYA2TZQO84k3EcR 2ga4ewa6iuTIoe9KZZFuPPJuMyMyrA559wC1e/k+GhCp0jcOLHmrasp1ELwbrZG9kQiQI/QXp+K gx5qPkei7fzivtPBeYl61Lg3l6u6F8w25/4L8Ieu1nrcdtEIsgO6Qd2t8WNLB0KyVznD7Odn2gv spsxu3pzKV4e4zvcZHiD7dh/xo82c0pg1MVHIW21Zi8InnqPAwvY9FIy3Wf5fJ7f+Hv3grTSAx0 C3Y X-Received: by 2002:a17:907:7f8c:b0:b89:9631:2425 with SMTP id a640c23a62f3a-b8dab1938d8mr321149166b.19.1769598924273; Wed, 28 Jan 2026 03:15:24 -0800 (PST) Received: from ?IPV6:2a0a:ef40:627:1f01:b22b:2092:b7ed:c8f5? ([2a0a:ef40:627:1f01:b22b:2092:b7ed:c8f5]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-b8dbf2ed58fsm112953266b.66.2026.01.28.03.15.23 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Jan 2026 03:15:23 -0800 (PST) Message-ID: <7d6dd22d-286e-4f7b-a211-44e00e711401@gmail.com> Date: Wed, 28 Jan 2026 11:15:22 +0000 Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: phillip.wood@dunelm.org.uk Subject: Re: [PATCH 01/10] ivec: introduce the C side of ivec To: Ezekiel Newren Cc: phillip.wood@dunelm.org.uk, Ezekiel Newren via GitGitGadget , git@vger.kernel.org References: <0437b899-5a36-4499-a30a-c2a074a80f7e@gmail.com> <08318339-03c3-4068-92fa-7a711bd13da0@gmail.com> From: Phillip Wood Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 21/01/2026 21:39, Ezekiel Newren wrote: > On Tue, Jan 20, 2026 at 7:06 AM Phillip Wood wrote: >> >> Hi Ezekiel >> >> On 15/01/2026 15:55, Ezekiel Newren wrote: >>> On Thu, Jan 8, 2026 at 7:34 AM Phillip Wood wrote: >>>>> +void ivec_reserve(void *self_, size_t additional) >>>>> +{ >>>>> + struct IVec_c_void *self = self_; >>>>> + >>>>> + size_t growby = 128; >>>>> + if (self->capacity > growby) >>>>> + growby = self->capacity; >>>>> + if (additional > growby) >>>>> + growby = additional; >>>> >>>> This growth strategy differs from both ALLOC_GROW() and >>>> XDL_ALLOC_GROW(), if there isn't a good reason for that we should >>>> perhaps just use ALLOC_GROW() here. >>> >>> XDL_ALLOW_GROW() can't be used because the pointer is always a void* >>> in this function. >> >> Oh right. I'm not sure that's not a reason to use a different growth >> strategy though. The minimum size of 128 elements is probably good for >> the xdiff code that creates arrays with one element per line but if this >> is supposed to be for general use it is going to waste space when we're >> allocating a lot of small arrays. ALLOC_GROW() uses alloc_nr() to >> calculate the new side so perhaps we could use that here? > > If ivec_reserve() isn't suitable then ivec_reserve_exact() should be > used instead. If some C code that pushes one element at a time to an array using ALLOC_GROW() is converted to use an ivec then we don't want to change the code behaves - that means it should grow the array in the same way. I don't see how the suggestion to use ivec_reserve_exact() helps in that situation. What is the advantage in having a different growth characteristic? >>>>> +void ivec_push(void *self_, const void *value) >>>>> +{ >>>>> + struct IVec_c_void *self = self_; >>>>> + void *dst = NULL; >>>>> + >>>>> + if (self->length == self->capacity) >>>>> + ivec_reserve(self, 1); >>>>> + >>>>> + dst = (uint8_t*)self->ptr + self->length * self->element_size; >>>>> + memcpy(dst, value, self->element_size); >>>> >>>> If self->element_size was a compile time constant the compiler could >>>> easily optimize this call away. I'm not sure that is easy to achieve though. >>> >>> The problem is that I didn't want all of ivec to be macros that looked >>> like function calls. I wanted to minimize use of macros so that it was >>> easier to port and verify that the Rust implementation matches the >>> behavior of the C implementation. >> >> I think that's a reasonable concern. So is the plan to have a parallel >> rust implementation of these functions rather than call the C >> implementation from rust? > > Yes, the Rust implementation will be independent of the C > implementation, but will behave the same way. That's why I'm calling > it an interoperable vec as opposed to a compatible vec. Rust can't > call the C ivec functions and C can't call the Rust ivec functions, > but they'll behave the same way. Interesting - I'm curious what the advantage of that is over having rust call the C implementation? I can see you wouldn't want to be calling into C for each ivec.push() call, but checking if there is room to push the new element in rust and calling into C to extend the vector if not should be reasonable and then you don't have to re-implement everything in rust. >>>>> +void ivec_free(void *self_) >>>> >>>> Normally we'd call a like this that free the allocations and >>>> re-initializes the members ivec_clear() >>> >>> In Rust Vec.clear() means to set length to zero, but leaves the >>> allocation alone. The reason why I'm zeroing the struct is to help >>> avoid FFI issues. If not zero then what should the members be set to, >>> to indicate that using the struct is not valid anymore? In Rust an >>> object is freed when it goes out of scope and _cannot_ be accessed >>> afterward. > > Maybe I should call this ivec_drop(). Though the notion of explicitly > freeing an object in Rust is _almost_ nonsense. The way you free > something in Rust is to let it go out of scope. Indeed - which means this wont be a public function in rust and so why do we worry about naming it ivec_clear()? At least ivec_drop() does not conflict with any of the standard function suffixes that we're already using in git. >> I'm aware that Vec::clear() has different semantics (it does what >> strbuf_reset() does). That's unfortunate but this function has different >> semantics to all the other *_free() functions in git. Our coding >> guidelines say >> >> - There are several common idiomatic names for functions performing >> specific tasks on a structure `S`: >> >> - `S_init()` initializes a structure without allocating the >> structure itself. >> >> - `S_release()` releases a structure's contents without freeing the >> structure. >> >> - `S_clear()` is equivalent to `S_release()` followed by `S_init()` >> such that the structure is directly usable after clearing it. When >> `S_clear()` is provided, `S_init()` shall not allocate resources >> that need to be released again. >> >> - `S_free()` releases a structure's contents and frees the >> structure. >> >> As we write more rust code and so wrap more of our existing structs >> we're going to be wrapping C code that uses the definitions above so I >> think we should do the same with struct IVec_*. > > I disagree. IVec isn't a wrapper around an existing struct. So just because it is a new stuct it shouldn't have to follow the existing naming conventions? > ivec is > meant to very closely mimic Rust's Vec while guaranteeing > interoperability. For things like strbuf I haven't conceived of a > solution for that yet. Making ivec diverge from Rust's Vec will result > in POLA violations due to different behavior when refactoring an > IVec to Vec. On the other hand, vec.reset() does not exist so you'd get a compiler error if you forgot to rename those calls when changing from IVec to Vec and the rust code wouldn't be calling ivec.clear(). I'm not sure citing POLA concerns is very convincing as ivec_free() in C is a POLA violation for anyone familiar with git's code base so it's not like there's a choice that avoids that concern. Thanks Phillip >>>>> diff --git a/compat/ivec.h b/compat/ivec.h >>>>> new file mode 100644 >>>>> index 0000000000..654a05c506 >>>>> --- /dev/null >>>>> +++ b/compat/ivec.h >>>>> @@ -0,0 +1,52 @@ >>>>> +#ifndef IVEC_H >>>>> +#define IVEC_H >>>>> + >>>>> +#include >>>> >>>> It would be nice to have some documentation in this header, see the >>>> examples in strvec.h and hashmap.h >>>> >>>>> +#define IVEC_INIT(variable) ivec_init(&(variable), sizeof(*(variable).ptr)) >>>> >>>> This is a bit cumbersome to use compared to our usual *_INIT macros. I'm >>>> struggling to see how we can make it nicer though as DEFINE_IVEC_TYPE >>>> cannot define a per-type initializer macro and I we cannot initialize >>>> the element size without knowing the type. >>> >>> I don't see what's cumbersome about it. Maybe an example use case >>> would clarify things. >> >> It is cumbersome because it separates the initialization from the >> declaration. Normally our *_INIT macros are initializer lists so we can >> write >> >> struct strbuf = STRBUF_INIT; >> >> which keeps the declaration and initialization together. Although >> they're on adjacent lines in your example in real code the >> initialization likely to be separated from the declaration by other >> variable declarations. > > Ah I see what you mean now. I'll experiment with making IVEC_INIT() > work like that. One wrinkle is that STRBUF_INIT is a single concrete > type whereas IVEC_INIT() is meant for generic types. If you can get it to work that would be great, but I can't think of a way of getting it to work for a generic type. Thanks Phillip