Re: [PATCH net-next v3 03/12] net: homa: create shared Homa header files

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Edward Cree <ecree.xilinx@gmail.com>
To: John Ousterhout <ouster@cs.stanford.edu>, netdev@vger.kernel.org
Subject: Re: [PATCH net-next v3 03/12] net: homa: create shared Homa header files
Date: Tue, 17 Dec 2024 06:36:27 +0000	[thread overview]
Message-ID: <8a73091e-5d4a-4802-ffef-a382adbbe88f@gmail.com> (raw)
In-Reply-To: <20241209175131.3839-5-ouster@cs.stanford.edu>

On 09/12/2024 17:51, John Ousterhout wrote:
> oma_impl.h defines "struct homa", which contains overall information
> about the Homa transport, plus various odds and ends that are used
> throughout the Homa implementation.

Should parts of 'struct homa' be per network namespace, rather than
 global, so that in systems hosting multiple containers each netns can
 configure Homa for the way it wants to use it?

> +struct homa_interest {
> +	/**
> +	 * @thread: Thread that would like to receive a message. Will get
> +	 * woken up when a suitable message becomes available.
> +	 */
> +	struct task_struct *thread;
> +
> +	/**
> +	 * @ready_rpc: This is actually a (struct homa_rpc *) identifying the
> +	 * RPC that was found; NULL if no RPC has been found yet. This
> +	 * variable is used for synchronization to handoff the RPC, and
> +	 * must be set only after @locked is set.
> +	 */
> +	atomic_long_t ready_rpc;
> +
> +	/**
> +	 * @locked: Nonzero means that @ready_rpc is locked; only valid
> +	 * if @ready_rpc is non-NULL.
> +	 */
> +	int locked;

These feel weird; what kind of synchronisation is this for and why
 aren't any of Linux's existing locking primitives suitable?  In
 particular the non-typesafe casting of ready_rpc is unpleasant.
I looked at sync.txt and didn't find an explanation, and it wasn't
 obvious from reading homa_register_interests() either.  (Are plain
 writes to int even guaranteed to be ordered wrt the atomics on
 rpc->flags or ready_rpc?)
My best guess from looking at how `thread` is used is that all this
 is somehow simulating a completion?  You shouldn't need to manually
 do stuff like sleeping and waking threads from within something as
 generic as a protocol implementation.

> +	interest->request_links.next = LIST_POISON1;
> +	interest->response_links.next = LIST_POISON1;

Any particular reason why you're opencoding poisoning, rather than
 using the list helpers (which distinguish between a list_head that
 has been inited but never added, so list_empty() returns true, and
 one which has been list_del()ed and thus poisoned)?
It would likely be easier for others to debug any issues that arise
 in Homa if when they see a list_head in an oops or crashdump they
 can relate it to the standard lifecycle.

> +/**
> + * struct homa - Overall information about the Homa protocol implementation.
> + *
> + * There will typically only exist one of these at a time, except during
> + * unit tests.
> + */
> +struct homa {
> +	/**
> +	 * @next_outgoing_id: Id to use for next outgoing RPC request.
> +	 * This is always even: it's used only to generate client-side ids.
> +	 * Accessed without locks.
> +	 */
> +	atomic64_t next_outgoing_id;

Does the ID need to be unique for the whole machine or just per-
 interface?  I would imagine it should be sufficient for the
 (id, source address) or even (id, saddr, sport) tuple to be
 unique.
And are there any security issues here; ought we to do anything
 like TCP does with sequence numbers to try to ensure they aren't
 guessable by an attacker?

> +	/**
> +	 * @throttled_rpcs: Contains all homa_rpcs that have bytes ready
> +	 * for transmission, but which couldn't be sent without exceeding
> +	 * the queue limits for transmission. Manipulate only with "_rcu"
> +	 * functions.
> +	 */
> +	struct list_head throttled_rpcs;

I'm not sure exactly how it works but I believe you can annotate
 the declaration with __rcu to get sparse to enforce this.

> +	/**
> +	 * @next_client_port: A client port number to consider for the
> +	 * next Homa socket; increments monotonically. Current value may
> +	 * be in the range allocated for servers; must check before using.
> +	 * This port may also be in use already; must check.
> +	 */
> +	__u16 next_client_port __aligned(L1_CACHE_BYTES);

Again, does guessability by an attacker pose any security risks
 here?

> +	/**
> +	 * @link_bandwidth: The raw bandwidth of the network uplink, in
> +	 * units of 1e06 bits per second.  Set externally via sysctl.
> +	 */
> +	int link_mbps;

What happens if a machine has two uplinks and someone wants to
 use Homa on both of them?  I wonder if most of the granting and
 pacing part of Homa ought to be per-netdev rather than per-host.
(Though in an SDN case with a bunch of containers issuing their
 RPCs through veths you'd want a Homa-aware bridge that could do
 the SRPT rather than bandwidth sharing, and having everything go
 through a single Homa stack instance does give you that for free.
 But then a VM use-case still needs the clever bridge anyway.)

> +	/**
> +	 * @timeout_ticks: abort an RPC if its silent_ticks reaches this value.
> +	 */
> +	int timeout_ticks;

This feels more like a socket-level option, perhaps?  Just
 thinking out loud.
> +	/**
> +	 * @gso_force_software: A non-zero value will cause Home to perform
> +	 * segmentation in software using GSO; zero means ask the NIC to
> +	 * perform TSO. Set externally via sysctl.
> +	 */

"Home" appears to be a typo.
> +	/**
> +	 * @temp: the values in this array can be read and written with sysctl.
> +	 * They have no officially defined purpose, and are available for
> +	 * short-term use during testing.
> +	 */
> +	int temp[4];

I don't think this belongs in upstream.  At best maybe under an ifdef
 like CONFIG_HOMA_DEBUG?

> +/**
> + * struct homa_skb_info - Additional information needed by Homa for each
> + * outbound DATA packet. Space is allocated for this at the very end of the
> + * linear part of the skb.
> + */
> +struct homa_skb_info {
> +	/**
> +	 * @next_skb: used to link together all of the skb's for a Homa
> +	 * message (in order of offset).
> +	 */
> +	struct sk_buff *next_skb;
> +
> +	/**
> +	 * @wire_bytes: total number of bytes of network bandwidth that
> +	 * will be consumed by this packet. This includes everything,
> +	 * including additional headers added by GSO, IP header, Ethernet
> +	 * header, CRC, preamble, and inter-packet gap.
> +	 */
> +	int wire_bytes;
> +
> +	/**
> +	 * @data_bytes: total bytes of message data across all of the
> +	 * segments in this packet.
> +	 */
> +	int data_bytes;
> +
> +	/** @seg_length: maximum number of data bytes in each GSO segment. */
> +	int seg_length;
> +
> +	/**
> +	 * @offset: offset within the message of the first byte of data in
> +	 * this packet.
> +	 */
> +	int offset;
> +};
> +
> +/**
> + * homa_get_skb_info() - Return the address of Homa's private information
> + * for an sk_buff.
> + * @skb:     Socket buffer whose info is needed.
> + */
> +static inline struct homa_skb_info *homa_get_skb_info(struct sk_buff *skb)
> +{
> +	return (struct homa_skb_info *)(skb_end_pointer(skb)
> +			- sizeof(struct homa_skb_info));
> +}
> +
> +/**
> + * homa_next_skb() - Compute address of Homa's private link field in @skb.
> + * @skb:     Socket buffer containing private link field.
> + *
> + * Homa needs to keep a list of buffers in a message, but it can't use the
> + * links built into sk_buffs because Homa wants to retain its list even
> + * after sending the packet, and the built-in links get used during sending.
> + * Thus we allocate extra space at the very end of the packet's data
> + * area to hold a forward pointer for a list.
> + */
> +static inline struct sk_buff **homa_next_skb(struct sk_buff *skb)
> +{
> +	return (struct sk_buff **)(skb_end_pointer(skb) - sizeof(char *));
> +}

This is confusing — why doesn't homa_next_skb(skb) equal
 &homa_get_skb_info(skb)->next_skb?  Is one used on TX and the other
 on RX, or something?

And could these subtractions be written as first casting to the
 appropriate pointer type and then subtracting 1, instead of
 subtracting sizeof from the unsigned char *end_pointer?
(Particularly as here you've taken sizeof a different kind of
 pointer — I know sizeof(char *) == sizeof(struct sk_buff *), but
 it's still kind of unclean.)

> +
> +/**
> + * homa_set_doff() - Fills in the doff TCP header field for a Homa packet.
> + * @h:     Packet header whose doff field is to be set.
> + * @size:  Size of the "header", bytes (must be a multiple of 4). This
> + *         information is used only for TSO; it's the number of bytes
> + *         that should be replicated in each segment. The bytes after
> + *         this will be distributed among segments.
> + */
> +static inline void homa_set_doff(struct data_header *h, int size)
> +{
> +	h->common.doff = size << 2;

Either put a comment here about the data offset being the high 4
 bits of doff, or use "(size >> 2) << 4" (or both!); at first
 glance this looks like a typo shifting the wrong way.
(TCP avoids this by playing games with bitfields in struct tcphdr.)

> +/**
> + * ipv4_to_ipv6() - Given an IPv4 address, return an equivalent IPv6 address
> + * (an IPv4-mapped one).
> + * @ip4: IPv4 address, in network byte order.
> + */
> +static inline struct in6_addr ipv4_to_ipv6(__be32 ip4)
> +{
> +	struct in6_addr ret = {};
> +
> +	if (ip4 == htonl(INADDR_ANY))
> +		return in6addr_any;
> +	ret.in6_u.u6_addr32[2] = htonl(0xffff);
> +	ret.in6_u.u6_addr32[3] = ip4;
> +	return ret;
> +}
> +
> +/**
> + * ipv6_to_ipv4() - Given an IPv6 address produced by ipv4_to_ipv6, return
> + * the original IPv4 address (in network byte order).
> + * @ip6:  IPv6 address; assumed to be a mapped IPv4 address.
> + */
> +static inline __be32 ipv6_to_ipv4(const struct in6_addr ip6)
> +{
> +	return ip6.in6_u.u6_addr32[3];
> +}
...
> +/**
> + * is_mapped_ipv4() - Return true if an IPv6 address is actually an
> + * IPv4-mapped address, false otherwise.
> + * @x:  The address to check.
> + */
> +static inline bool is_mapped_ipv4(const struct in6_addr x)
> +{
> +	return ((x.in6_u.u6_addr32[0] == 0) &&
> +		(x.in6_u.u6_addr32[1] == 0) &&
> +		(x.in6_u.u6_addr32[2] == htonl(0xffff)));
> +}

These probably belong in some general inet header rather than being
 buried inside Homa.  There's __ipv6_addr_type() but that might be a
 bit heavyweight; also ipv6_addr_v4mapped() and
 ipv6_addr_set_v4mapped() in include/net/ipv6.h.

next prev parent reply	other threads:[~2024-12-17  6:36 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-09 17:51 [PATCH net-next v3 00/12] Begin upstreaming Homa transport protocol John Ousterhout
2024-12-09 17:51 ` [PATCH net-next v3 01/12] inet: homa: define user-visible API for Homa John Ousterhout
2024-12-09 17:51 ` [PATCH net-next 01/12] net: " John Ousterhout
2024-12-09 17:54   ` John Ousterhout
2024-12-09 17:51 ` [PATCH net-next v3 02/12] net: homa: define Homa packet formats John Ousterhout
2024-12-12 12:29   ` Edward Cree
2024-12-13 18:32     ` John Ousterhout
2024-12-09 17:51 ` [PATCH net-next v3 03/12] net: homa: create shared Homa header files John Ousterhout
2024-12-17  6:36   ` Edward Cree [this message]
2024-12-18  5:46     ` John Ousterhout
2024-12-18 12:25       ` Edward Cree
2024-12-19 17:50         ` John Ousterhout
2024-12-09 17:51 ` [PATCH net-next v3 04/12] net: homa: create homa_pool.h and homa_pool.c John Ousterhout
2024-12-09 17:51 ` [PATCH net-next v3 05/12] net: homa: create homa_rpc.h and homa_rpc.c John Ousterhout
2024-12-09 17:51 ` [PATCH net-next v3 06/12] net: homa: create homa_peer.h and homa_peer.c John Ousterhout
2024-12-09 17:51 ` [PATCH net-next v3 07/12] net: homa: create homa_sock.h and homa_sock.c John Ousterhout
2024-12-09 17:51 ` [PATCH net-next v3 08/12] net: homa: create homa_incoming.c John Ousterhout
2024-12-09 17:51 ` [PATCH net-next v3 09/12] net: homa: create homa_outgoing.c John Ousterhout
2024-12-09 17:51 ` [PATCH net-next v3 10/12] net: homa: create homa_timer.c John Ousterhout
2024-12-09 17:51 ` [PATCH net-next v3 11/12] net: homa: create homa_plumbing.c homa_utils.c John Ousterhout
2024-12-10  6:08   ` D. Wythe
2024-12-11  1:59     ` John Ousterhout
2024-12-11  5:06       ` D. Wythe
2024-12-11 22:59         ` John Ousterhout
2024-12-11  6:39   ` D. Wythe
2024-12-11 23:01     ` John Ousterhout
2024-12-09 17:51 ` [PATCH net-next v3 12/12] net: homa: create Makefile and Kconfig John Ousterhout
2024-12-11  0:32   ` kernel test robot
2024-12-11 17:29     ` John Ousterhout

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a73091e-5d4a-4802-ffef-a382adbbe88f@gmail.com \
    --to=ecree.xilinx@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=ouster@cs.stanford.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).