From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B64F3137C51 for ; Tue, 20 Feb 2024 19:33:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708457642; cv=none; b=LbUNB9uFC4+9LIiM/AtPIjWikNy9nTKuT2zX6Gf2UISDmYyb2mVKMW0wqSew39lIV0ykUhrTpppSLLoc2jvdDfLW9rArS0uA/MaUOlD0dgM5tMJamRS58Rr8O7UbiLa4uVebhUqg7escSECpigYlcQ6hmqdQLs0jIFpCDTXY06o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708457642; c=relaxed/simple; bh=MSsWNZbrSA8jmwOTwY01KkTTLNM6KGirzTg57mM/DHc=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=bMWbyxg3AUbE56we4dHdMUnnHoh2g4+tqrXMcA1RJGe3QDAtZoHK9u1pVLJhO2yKCCE6u9fFwP0+dbvBymbOqzQYDNft1y94AQ4VNqaJPN8+Ea6HJeAh9pTiATCCr2tNxcVomVvcRuwd7hzchn9hVogs02MoiB38Dt6jsXee2CQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=W4ak81ii; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="W4ak81ii" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1708457638; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d2+K9SYZnDq2b2vxFoXzss6f5L3wcHVChuNq2JQkC0k=; b=W4ak81iiUroQCB+SEnxHqZc8n2HvxeZ0BW0QS+R25uazlr4Us9p5HAqGDAuXV1h4/hE3j2 Etx7L69lE0/Npnk12Zfxvvs7OHuofy7BNKuCDs6HxTc20Ru9PnNcxcCewV/sxqQxbA4lMu IYUpCZ4L0i0oKGPiUybCVP3yISzqKVE= Received: from mail-ej1-f69.google.com (mail-ej1-f69.google.com [209.85.218.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-516-cmSz0x-FP2aKsLGGCPPIZA-1; Tue, 20 Feb 2024 14:33:56 -0500 X-MC-Unique: cmSz0x-FP2aKsLGGCPPIZA-1 Received: by mail-ej1-f69.google.com with SMTP id a640c23a62f3a-a30f9374db7so698554266b.0 for ; Tue, 20 Feb 2024 11:33:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708457635; x=1709062435; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=d2+K9SYZnDq2b2vxFoXzss6f5L3wcHVChuNq2JQkC0k=; b=FUHqjHtRcRG6D3IBpmhqFK7o0nrbdTDreBFWU6tkMEOd4SZJVYJiR7YmfcT9SXEMsa dbeq3BNevjtpcfGW/Opy85oPeoleJ2O6GQreFkW+Lsa40XEDP5fwpwD95kflHyPsQYhW ew7XGVd2qCSuW+nSUl60N1omOmwTzBjKPPsyY44m3FaTFdBGKUqtqeSEwDjKlfvPZCPR dxmhyLVTEA8tiGB6Y0ktaQIWTzHfcYlIwG9eszY3diOu1xLubEVFlcSA53uFDRXBHYPT r6lVW3+3nyyEoVaTGxr8jeecV02oo4gOfH6w20vO2c+vXsjC89S16bY1t80bRZ5ezZgY ok3g== X-Forwarded-Encrypted: i=1; AJvYcCWC5Iv0KU5qpXnswzJpYw/6wzq+ARHWus5sIyk7cd+8Y+xAu9MoI6/CArf+e2jAfexOQEE7C/vpsY0KsUcwO6dHxWCqtQWv X-Gm-Message-State: AOJu0YxZczHz/ao9sVpdtb8VcvsRjztUH16uweCCRNr8QIddEv9ETNsu asRimMLLaEhQ4GP5Y4saLXqfOJoVTDLFiKVef9I3BMda9H8vXhVGIPYq4vgqkD9wc8l92ddrdod mKysOFlesD0Sy2Bn/4dq5jXF8t5jogyrna0TrZ6rz1zUz6usHJNdETQ== X-Received: by 2002:a17:907:7ea9:b0:a3e:b57f:2b8a with SMTP id qb41-20020a1709077ea900b00a3eb57f2b8amr6030605ejc.10.1708457635381; Tue, 20 Feb 2024 11:33:55 -0800 (PST) X-Google-Smtp-Source: AGHT+IGbxIJ1yuGOFI6iLmW9ulpsaRCDfsBBUAz1piwEm4OhcKvAQVntnoQjtfgklYIhoIN0YmrB+Q== X-Received: by 2002:a17:907:7ea9:b0:a3e:b57f:2b8a with SMTP id qb41-20020a1709077ea900b00a3eb57f2b8amr6030569ejc.10.1708457635000; Tue, 20 Feb 2024 11:33:55 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id pk27-20020a170906d7bb00b00a3ee20b00d0sm1254310ejb.4.2024.02.20.11.33.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Feb 2024 11:33:53 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 4E1A010F63D2; Tue, 20 Feb 2024 20:33:53 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Paolo Abeni , Daniel Borkmann , Alexei Starovoitov , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , "David S. Miller" , Jakub Kicinski , Jesper Dangaard Brouer Cc: Alexander Lobakin , Eric Dumazet , bpf@vger.kernel.org, netdev@vger.kernel.org Subject: Re: [PATCH net-next 2/3] bpf: test_run: Use system page pool for XDP live frame mode In-Reply-To: <83e7faeb4a241a00053fc71dbb18d1dbc7c0fac6.camel@redhat.com> References: <20240215132634.474055-1-toke@redhat.com> <20240215132634.474055-3-toke@redhat.com> <59c022bf-4cc4-850f-f8ab-3b8aab36f958@iogearbox.net> <87frxn1dnq.fsf@toke.dk> <83e7faeb4a241a00053fc71dbb18d1dbc7c0fac6.camel@redhat.com> X-Clacks-Overhead: GNU Terry Pratchett Date: Tue, 20 Feb 2024 20:33:53 +0100 Message-ID: <877ciz0w2m.fsf@toke.dk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Paolo Abeni writes: > On Tue, 2024-02-20 at 14:14 +0100, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> Paolo Abeni writes: >>=20 >> > On Tue, 2024-02-20 at 10:06 +0100, Daniel Borkmann wrote: >> > > On 2/15/24 2:26 PM, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> > > > The BPF_TEST_RUN code in XDP live frame mode creates a new page po= ol >> > > > each time it is called and uses that to allocate the frames used f= or the >> > > > XDP run. This works well if the syscall is used with a high repeti= tions >> > > > number, as it allows for efficient page recycling. However, if use= d with >> > > > a small number of repetitions, the overhead of creating and tearin= g down >> > > > the page pool is significant, and can even lead to system stalls i= f the >> > > > syscall is called in a tight loop. >> > > >=20 >> > > > Now that we have a persistent system page pool instance, it becomes >> > > > pretty straight forward to change the test_run code to use it. The= only >> > > > wrinkle is that we can no longer rely on a custom page init callba= ck >> > > > from page_pool itself; instead, we change the test_run code to wri= te a >> > > > random cookie value to the beginning of the page as an indicator t= hat >> > > > the page has been initialised and can be re-used without copying t= he >> > > > initial data again. >> > > >=20 >> > > > Signed-off-by: Toke H=C3=B8iland-J=C3=B8rgensen >> > >=20 >> > > [...] >> > > > - >> > > > /* We create a 'fake' RXQ referencing the original dev, but wit= h an >> > > > * xdp_mem_info pointing to our page_pool >> > > > */ >> > > > xdp_rxq_info_reg(&xdp->rxq, orig_ctx->rxq->dev, 0, 0); >> > > > - xdp->rxq.mem.type =3D MEM_TYPE_PAGE_POOL; >> > > > - xdp->rxq.mem.id =3D pp->xdp_mem_id; >> > > > + xdp->rxq.mem.type =3D MEM_TYPE_PAGE_POOL; /* mem id is set per-f= rame below */ >> > > > xdp->dev =3D orig_ctx->rxq->dev; >> > > > xdp->orig_ctx =3D orig_ctx; >> > > >=20=20=20 >> > > > + /* We need a random cookie for each run as pages can stick around >> > > > + * between runs in the system page pool >> > > > + */ >> > > > + get_random_bytes(&xdp->cookie, sizeof(xdp->cookie)); >> > > > + >> > >=20 >> > > So the assumption is that there is only a tiny chance of collisions = with >> > > users outside of xdp test_run. If they do collide however, you'd lea= k data. >> >=20 >> > Good point. @Toke: what is the worst-case thing that could happen in >> > case a page is recycled from another pool's user? >> >=20 >> > could we possibly end-up matching the cookie for a page containing >> > 'random' orig_ctx/ctx, so that bpf program later tries to access >> > equally random ptrs? >>=20 >> Well, yes, if there's a collision in the cookie value we'll end up >> basically dereferencing garbage pointer values, with all the badness >> that ensues (most likely just a crash, but system compromise is probably >> also possible in such a case). >>=20 >> A 64-bit value is probably too small to be resistant against random >> collisions in a "protect global data across the internet" type scenario >> (for instance, a 64-bit cryptographic key is considered weak). However, >> in this case the collision domain is only for the lifetime of the >> running system, and each cookie value only stays valid for the duration >> of a single syscall (seconds, at most), so I figured it was acceptable. >>=20 >> We could exclude all-zeros as a valid cookie value (and also anything >> that looks as a valid pointer), but that only removes a few of the >> possible random collision values, so if we're really worried about >> random collisions of 64-bit numbers, I think a better approach would be >> to just make the cookie a 128-bit value instead. I can respin with that >> if you prefer? :) > > I must admit that merging a code that will allow trashing the kernel - > even with a very low probability - is quite scaring to me. > > How much relevant is the recycle case optimization? Could removing > completely that optimization be considered? Did a quick test of this, and skipping the recycling eats ~12.5% performance, so I don't think getting rid of it is a good solution. However, increasing the cookie size to 128 bits makes no performance difference (everything stays in the same cache lines). If we do that, the collision probability enters "won't happen before the heat death of the universe" territory, so I don't think there's any real concern that this will happen. I'll respin with the bigger cookie size (and add that patch Olek suggested to remove the init callback from the page pool code). -Toke