From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_2 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C25EC4321A for ; Fri, 28 Jun 2019 14:11:29 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 175032133F for ; Fri, 28 Jun 2019 14:11:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 175032133F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kaod.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:60408 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hgrb6-0006sq-68 for qemu-devel@archiver.kernel.org; Fri, 28 Jun 2019 10:11:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:48801) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hgqbA-0006r9-W8 for qemu-devel@nongnu.org; Fri, 28 Jun 2019 09:07:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hgqb8-00076S-0h for qemu-devel@nongnu.org; Fri, 28 Jun 2019 09:07:28 -0400 Received: from 12.mo6.mail-out.ovh.net ([178.32.125.228]:38646) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hgqb7-00074G-Mp for qemu-devel@nongnu.org; Fri, 28 Jun 2019 09:07:25 -0400 Received: from player795.ha.ovh.net (unknown [10.108.54.108]) by mo6.mail-out.ovh.net (Postfix) with ESMTP id 67C681D4289 for ; Fri, 28 Jun 2019 13:50:21 +0200 (CEST) Received: from kaod.org (lns-bzn-46-82-253-208-248.adsl.proxad.net [82.253.208.248]) (Authenticated sender: groug@kaod.org) by player795.ha.ovh.net (Postfix) with ESMTPSA id 8F4C974B3352; Fri, 28 Jun 2019 11:50:17 +0000 (UTC) Date: Fri, 28 Jun 2019 13:50:15 +0200 From: Greg Kurz To: Christian Schoenebeck via Qemu-devel Message-ID: <20190628135015.2d1618cf@bahia.lan> In-Reply-To: References: X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable X-Ovh-Tracer-Id: 18373842057013860672 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduvddrvddtgdegkecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfqggfjpdevjffgvefmvefgnecuuegrihhlohhuthemucehtddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 178.32.125.228 Subject: Re: [Qemu-devel] [PATCH v4 5/5] 9p: Use variable length suffixes for inode remapping X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Daniel P. =?UTF-8?B?QmVycmFuZ8Op?=" , Christian Schoenebeck , Antonios Motakis Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On Wed, 26 Jun 2019 20:52:09 +0200 Christian Schoenebeck via Qemu-devel wrote: > Use variable length suffixes for inode remapping instead of the fixed 16 > bit size prefixes before. With this change the inode numbers on guest will > typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48 with > the previous fixed size inode remapping. >=20 > Additionally this solution should be more efficient, since inode numbers = in > practice can take almost their entire 64 bit range on guest as well. Which > might also be beneficial for nested virtualization. >=20 > The "Exponential Golomb" algorithm is used as basis for generating the > variable length suffixes. The algorithm has a paramter k which controls t= he > distribution of bits on increasing indeces (minimum bits at low index vs. > maximum bits at high index). With k=3D0 the generated suffixes look like: >=20 > Index Dec/Bin -> Generated Suffix Bin > 1 [1] -> [1] (1 bits) > 2 [10] -> [010] (3 bits) > 3 [11] -> [110] (3 bits) > 4 [100] -> [00100] (5 bits) > 5 [101] -> [10100] (5 bits) > 6 [110] -> [01100] (5 bits) > 7 [111] -> [11100] (5 bits) > 8 [1000] -> [0001000] (7 bits) > 9 [1001] -> [1001000] (7 bits) > 10 [1010] -> [0101000] (7 bits) > 11 [1011] -> [1101000] (7 bits) > 12 [1100] -> [0011000] (7 bits) > ... > 65533 [1111111111111101] -> [1011111111111111000000000000000] (31 bits) > 65534 [1111111111111110] -> [0111111111111111000000000000000] (31 bits) > 65535 [1111111111111111] -> [1111111111111111000000000000000] (31 bits) > Hence minBits=3D1 maxBits=3D31 >=20 > And with k=3D5 they would look like: >=20 > Index Dec/Bin -> Generated Suffix Bin > 1 [1] -> [000001] (6 bits) > 2 [10] -> [100001] (6 bits) > 3 [11] -> [010001] (6 bits) > 4 [100] -> [110001] (6 bits) > 5 [101] -> [001001] (6 bits) > 6 [110] -> [101001] (6 bits) > 7 [111] -> [011001] (6 bits) > 8 [1000] -> [111001] (6 bits) > 9 [1001] -> [000101] (6 bits) > 10 [1010] -> [100101] (6 bits) > 11 [1011] -> [010101] (6 bits) > 12 [1100] -> [110101] (6 bits) > ... > 65533 [1111111111111101] -> [0011100000000000100000000000] (28 bits) > 65534 [1111111111111110] -> [1011100000000000100000000000] (28 bits) > 65535 [1111111111111111] -> [0111100000000000100000000000] (28 bits) > Hence minBits=3D6 maxBits=3D28 >=20 IIUC, this k control parameter should be constant for the lifetime of QIDs. So it all boils down to choose a _good_ value that would cover most scenarios, right ? For now, I just have some _cosmetic_ remarks. > Signed-off-by: Christian Schoenebeck > --- > hw/9pfs/9p.c | 267 +++++++++++++++++++++++++++++++++++++++++++++++++++++= +----- > hw/9pfs/9p.h | 67 ++++++++++++++- > 2 files changed, 312 insertions(+), 22 deletions(-) >=20 > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > index e6e410972f..46c9f11384 100644 > --- a/hw/9pfs/9p.c > +++ b/hw/9pfs/9p.c > @@ -26,6 +26,7 @@ > #include "migration/blocker.h" > #include "sysemu/qtest.h" > #include "qemu/xxhash.h" > +#include > =20 > int open_fd_hw; > int total_open_fd; > @@ -572,6 +573,123 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu) > P9_STAT_MODE_NAMED_PIPE | \ > P9_STAT_MODE_SOCKET) > =20 > +#if P9_VARI_LENGTH_INODE_SUFFIXES The numerous locations guarded by P9_VARI_LENGTH_INODE_SUFFIXES really obfuscate the code, and don't ease review (for me at least). And anyway, if we go for variable length suffixes, we won't keep the fixed length prefix code. > + > +/* Mirrors all bits of a byte. So e.g. binary 10100000 would become 0000= 0101. */ > +static inline uint8_t mirror8bit(uint8_t byte) { =46rom CODING_STYLE: 4. Block structure [...] for reasons of tradition and clarity it comes on a line by itself: void a_function(void) { do_something(); } > + return (byte * 0x0202020202ULL & 0x010884422010ULL) % 1023; > +} > + > +/* Same as mirror8bit() just for a 64 bit data type instead for a byte. = */ > +static inline uint64_t mirror64bit(uint64_t value) { Ditto. > + return ((uint64_t)mirror8bit( value & 0xff) << 56) | > + ((uint64_t)mirror8bit((value >> 8) & 0xff) << 48) | > + ((uint64_t)mirror8bit((value >> 16) & 0xff) << 40) | > + ((uint64_t)mirror8bit((value >> 24) & 0xff) << 32) | > + ((uint64_t)mirror8bit((value >> 32) & 0xff) << 24) | > + ((uint64_t)mirror8bit((value >> 40) & 0xff) << 16) | > + ((uint64_t)mirror8bit((value >> 48) & 0xff) << 8 ) | > + ((uint64_t)mirror8bit((value >> 56) & 0xff) ) ; > +} > + > +/* Parameter k for the Exponential Golomb algorihm to be used. > + * > + * The smaller this value, the smaller the minimum bit count for the Exp. > + * Golomb generated affixes will be (at lowest index) however for the > + * price of having higher maximum bit count of generated affixes (at hig= hest > + * index). Likewise increasing this parameter yields in smaller maximum = bit > + * count for the price of having higher minimum bit count. Forgive my laziness but what are the benefits of a smaller or larger value, in term of user experience ? > + */ > +#define EXP_GOLOMB_K 0 > + > +# if !EXP_GOLOMB_K > + > +/** @brief Exponential Golomb algorithm limited to the case k=3D0. > + * This doesn't really help to have a special implementation for k=3D0. The resulting function is nearly identical to the general case. It is likely that the compiler can optimize it and generate the same code. > + * See expGolombEncode() below for details. > + * > + * @param n - natural number (or index) of the prefix to be generated > + * (1, 2, 3, ...) > + */ > +static VariLenAffix expGolombEncodeK0(uint64_t n) { > + const int bits =3D (int) log2(n) + 1; > + return (VariLenAffix) { > + .type =3D AffixType_Prefix, > + .value =3D n, > + .bits =3D bits + bits - 1 > + }; > +} > + > +# else > + > +/** @brief Exponential Golomb algorithm for arbitrary k (including k=3D0= ). > + * > + * The Exponential Golomb algorithm generates @b prefixes (@b not suffix= es!) > + * with growing length and with the mathematical property of being > + * "prefix-free". The latter means the generated prefixes can be prepend= ed > + * in front of arbitrary numbers and the resulting concatenated numbers = are > + * guaranteed to be always unique. > + * > + * This is a minor adjustment to the original Exp. Golomb algorithm in t= he > + * sense that lowest allowed index (@param n) starts with 1, not with ze= ro. > + * > + * @param n - natural number (or index) of the prefix to be generated > + * (1, 2, 3, ...) > + * @param k - parameter k of Exp. Golomb algorithm to be used > + * (see comment on EXP_GOLOMB_K macro for details about k) > + */ > +static VariLenAffix expGolombEncode(uint64_t n, int k) { Function. > + const uint64_t value =3D n + (1 << k) - 1; > + const int bits =3D (int) log2(value) + 1; > + return (VariLenAffix) { > + .type =3D AffixType_Prefix, > + .value =3D value, > + .bits =3D bits + MAX((bits - 1 - k), 0) > + }; > +} > + > +# endif /* !EXP_GOLOMB_K */ > + > +/** @brief Converts a suffix into a prefix, or a prefix into a suffix. > + * > + * What this function does is actually mirroring all bits of the affix v= alue, Drop the "What this function does..." wording and use an imperative style instead, ie. "Mirror all bits of..." > + * with the purpose to preserve respectively the mathematical "prefix-fr= ee" > + * or "suffix-free" property after the conversion. > + * > + * In other words: if a passed prefix is suitable to create unique numbe= rs, > + * then the returned suffix is suitable to create unique numbers as well > + * (and vice versa). > + */ > +static VariLenAffix invertAffix(const VariLenAffix* affix) { Function. > + return (VariLenAffix) { > + .type =3D (affix->type =3D=3D AffixType_Suffix) ? AffixType_Pref= ix : AffixType_Suffix, > + .value =3D mirror64bit(affix->value) >> ((sizeof(affix->value) = * 8) - affix->bits), > + .bits =3D affix->bits > + }; > +} > + > +/** @brief Generates suffix numbers with "suffix-free" property. > + * > + * This is just a wrapper function on top of the Exp. Golomb algorithm. > + * > + * Since the Exp. Golomb algorithm generates prefixes, but we need suffi= xes, > + * this function converts the Exp. Golomb prefixes into appropriate suff= ixes > + * which are still suitable for generating unique numbers. > + * > + * @param n - natural number (or index) of the suffix to be generated > + * (1, 2, 3, ...) > + */ > +static VariLenAffix affixForIndex(uint64_t index) { Function. > + VariLenAffix prefix; > + #if EXP_GOLOMB_K > + prefix =3D expGolombEncode(index, EXP_GOLOMB_K); > + #else > + prefix =3D expGolombEncodeK0(index); > + #endif > + return invertAffix(&prefix); /* convert prefix to suffix */ > +} > + > +#endif /* P9_VARI_LENGTH_INODE_SUFFIXES */ > =20 > /* creative abuse of tb_hash_func7, which is based on xxhash */ > static uint32_t qpp_hash(QppEntry e) > @@ -584,13 +702,23 @@ static uint32_t qpf_hash(QpfEntry e) > return qemu_xxhash7(e.ino, e.dev, 0, 0, 0); > } > =20 > -static bool qpp_lookup_func(const void *obj, const void *userp) > +#if P9_VARI_LENGTH_INODE_SUFFIXES > + > +static bool qpd_cmp_func(const void *obj, const void *userp) > +{ > + const QpdEntry *e1 =3D obj, *e2 =3D userp; > + return e1->dev =3D=3D e2->dev; > +} > + > +#endif /* P9_VARI_LENGTH_INODE_SUFFIXES */ > + > +static bool qpp_cmp_func(const void *obj, const void *userp) > { > const QppEntry *e1 =3D obj, *e2 =3D userp; > return e1->dev =3D=3D e2->dev && e1->ino_prefix =3D=3D e2->ino_prefi= x; > } > =20 > -static bool qpf_lookup_func(const void *obj, const void *userp) > +static bool qpf_cmp_func(const void *obj, const void *userp) > { > const QpfEntry *e1 =3D obj, *e2 =3D userp; > return e1->dev =3D=3D e2->dev && e1->ino =3D=3D e2->ino; > @@ -607,6 +735,58 @@ static void qp_table_destroy(struct qht *ht) > qht_destroy(ht); > } > =20 > +#if P9_VARI_LENGTH_INODE_SUFFIXES > + > +/* > + * Returns how many (high end) bits of inode numbers of the passed fs > + * device shall be used (in combination with the device number) to > + * generate hash values for qpp_table entries. > + * > + * This function is required if variable length suffixes are used for in= ode > + * number mapping on guest level. Since a device may end up having multi= ple > + * entries in qpp_table, each entry most probably with a different suffix > + * length, we thus need this function in conjunction with qpd_table to > + * "agree" about a fix amount of bits (per device) to be always used for > + * generating hash values for the purpose of accessing qpp_table in order > + * get consistent behaviour when accessing qpp_table. > + */ > +static int qid_inode_prefix_hash_bits(V9fsPDU *pdu, dev_t dev) > +{ > + QpdEntry lookup =3D { > + .dev =3D dev > + }, *val; > + uint32_t hash =3D dev; > + VariLenAffix affix; > + > + val =3D qht_lookup(&pdu->s->qpd_table, &lookup, hash); > + if (!val) { > + val =3D g_malloc0(sizeof(QpdEntry)); > + *val =3D lookup; > + affix =3D affixForIndex(pdu->s->qp_affix_next); > + val->prefix_bits =3D affix.bits; > + qht_insert(&pdu->s->qpd_table, val, hash, NULL); > + pdu->s->qp_ndevices++; > + } > + return val->prefix_bits; > +} > + > +#endif /* P9_VARI_LENGTH_INODE_SUFFIXES */ > + > +/** @brief Slow / full mapping host inode nr -> guest inode nr. > + * > + * This function performs a slower and much more costly remapping of an > + * original file inode number on host to an appropriate different inode > + * number on guest. For every (dev, inode) combination on host a new > + * sequential number is generated, cached and exposed as inode number on > + * guest. > + * > + * This is just a "last resort" fallback solution if the much faster/che= aper > + * qid_path_prefixmap() failed. In practice this slow / full mapping is = not > + * expected ever to be used at all though. > + * > + * @see qid_path_prefixmap() for details > + * > + */ > static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf, > uint64_t *path) > { > @@ -615,11 +795,9 @@ static int qid_path_fullmap(V9fsPDU *pdu, const stru= ct stat *stbuf, > .ino =3D stbuf->st_ino > }, *val; > uint32_t hash =3D qpf_hash(lookup); > - > - /* most users won't need the fullmap, so init the table lazily */ > - if (!pdu->s->qpf_table.map) { > - qht_init(&pdu->s->qpf_table, qpf_lookup_func, 1 << 16, QHT_MODE_= AUTO_RESIZE); > - } > +#if P9_VARI_LENGTH_INODE_SUFFIXES > + VariLenAffix affix; > +#endif > =20 Move this declaration to the beginning of the block. > val =3D qht_lookup(&pdu->s->qpf_table, &lookup, hash); > =20 > @@ -633,8 +811,16 @@ static int qid_path_fullmap(V9fsPDU *pdu, const stru= ct stat *stbuf, > *val =3D lookup; > =20 > /* new unique inode and device combo */ > +#if P9_VARI_LENGTH_INODE_SUFFIXES > + affix =3D affixForIndex( > + 1ULL << (sizeof(pdu->s->qp_affix_next) * 8) > + ); > + val->path =3D (pdu->s->qp_fullpath_next++ << affix.bits) | affix= .value; > + pdu->s->qp_fullpath_next &=3D ((1ULL << (64 - affix.bits)) - 1); > +#else > val->path =3D pdu->s->qp_fullpath_next++; > pdu->s->qp_fullpath_next &=3D QPATH_INO_MASK; > +#endif > qht_insert(&pdu->s->qpf_table, val, hash, NULL); > } > =20 > @@ -642,42 +828,71 @@ static int qid_path_fullmap(V9fsPDU *pdu, const str= uct stat *stbuf, > return 0; > } > =20 > -/* stat_to_qid needs to map inode number (64 bits) and device id (32 bit= s) > +/** @brief Quick mapping host inode nr -> guest inode nr. > + * > + * This function performs quick remapping of an original file inode numb= er > + * on host to an appropriate different inode number on guest. This remap= ping > + * of inodes is required to avoid inode nr collisions on guest which wou= ld > + * happen if the 9p export contains more than 1 exported file system (or > + * more than 1 file system data set), because unlike on host level where= the > + * files would have different device nrs, all files exported by 9p would > + * share the same device nr on guest (the device nr of the virtual 9p de= vice > + * that is). > + * > + * if P9_VARI_LENGTH_INODE_SUFFIXES =3D=3D 0 : > + * stat_to_qid needs to map inode number (64 bits) and device id (32 bit= s) > * to a unique QID path (64 bits). To avoid having to map and keep track > * of up to 2^64 objects, we map only the 16 highest bits of the inode p= lus > * the device id to the 16 highest bits of the QID path. The 48 lowest b= its > * of the QID path equal to the lowest bits of the inode number. > * > - * This takes advantage of the fact that inode number are usually not > - * random but allocated sequentially, so we have fewer items to keep > - * track of. Hmm... why dropping this comment ? > + * if P9_VARI_LENGTH_INODE_SUFFIXES =3D=3D 1 : > + * Instead of fixed size (16 bit) generated prefix, we use variable size > + * suffixes instead. See comment on P9_VARI_LENGTH_INODE_SUFFIXES for > + * details. > */ > static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf, > uint64_t *path) > { > +#if P9_VARI_LENGTH_INODE_SUFFIXES Starting from here, the patch has too many P9_VARI_LENGTH_INODE_SUFFIXES guards for my laziness and available time... I'll rather wait for the next round where both approaches don't coexist. Cheers, -- Greg > + const int ino_hash_bits =3D qid_inode_prefix_hash_bits(pdu, stbuf->s= t_dev); > +#endif > QppEntry lookup =3D { > .dev =3D stbuf->st_dev, > +#if P9_VARI_LENGTH_INODE_SUFFIXES > + .ino_prefix =3D (uint16_t) (stbuf->st_ino >> (64-ino_hash_bits)) > +#else > .ino_prefix =3D (uint16_t) (stbuf->st_ino >> 48) > +#endif > }, *val; > uint32_t hash =3D qpp_hash(lookup); > =20 > val =3D qht_lookup(&pdu->s->qpp_table, &lookup, hash); > =20 > if (!val) { > - if (pdu->s->qp_prefix_next =3D=3D 0) { > - /* we ran out of prefixes */ > + if (pdu->s->qp_affix_next =3D=3D 0) { > + /* we ran out of affixes */ > return -ENFILE; > } > =20 > val =3D g_malloc0(sizeof(QppEntry)); > *val =3D lookup; > =20 > - /* new unique inode prefix and device combo */ > - val->qp_prefix =3D pdu->s->qp_prefix_next++; > + /* new unique inode affix and device combo */ > +#if P9_VARI_LENGTH_INODE_SUFFIXES > + val->qp_affix_index =3D pdu->s->qp_affix_next++; > + val->qp_affix =3D affixForIndex(val->qp_affix_index); > +#else > + val->qp_affix =3D pdu->s->qp_affix_next++; > +#endif > qht_insert(&pdu->s->qpp_table, val, hash, NULL); > } > - > - *path =3D ((uint64_t)val->qp_prefix << 48) | (stbuf->st_ino & QPATH_= INO_MASK); > +#if P9_VARI_LENGTH_INODE_SUFFIXES > + /* assuming generated affix to be suffix type, not prefix */ > + *path =3D (stbuf->st_ino << val->qp_affix.bits) | val->qp_affix.valu= e; > +#else > + *path =3D ((uint64_t)val->qp_affix << 48) | (stbuf->st_ino & QPATH_I= NO_MASK); > +#endif > return 0; > } > =20 > @@ -3799,9 +4014,17 @@ int v9fs_device_realize_common(V9fsState *s, const= V9fsTransport *t, > =20 > s->dev_id =3D 0; > =20 > +#if P9_VARI_LENGTH_INODE_SUFFIXES > + qht_init(&s->qpd_table, qpd_cmp_func, 1, QHT_MODE_AUTO_RESIZE); > +#endif > + /* most users won't need the fullmap, so init the table lazily */ > + qht_init(&s->qpf_table, qpf_cmp_func, 1 << 16, QHT_MODE_AUTO_RESIZE); > /* QID path hash table. 1 entry ought to be enough for anybody ;) */ > - qht_init(&s->qpp_table, qpp_lookup_func, 1, QHT_MODE_AUTO_RESIZE); > - s->qp_prefix_next =3D 1; /* reserve 0 to detect overflow */ > + qht_init(&s->qpp_table, qpp_cmp_func, 1, QHT_MODE_AUTO_RESIZE); > +#if P9_VARI_LENGTH_INODE_SUFFIXES > + s->qp_ndevices =3D 0; > +#endif > + s->qp_affix_next =3D 1; /* reserve 0 to detect overflow */ > s->qp_fullpath_next =3D 1; > =20 > s->ctx.fst =3D &fse->fst; > @@ -3817,6 +4040,9 @@ out: > } > g_free(s->tag); > g_free(s->ctx.fs_root); > +#if P9_VARI_LENGTH_INODE_SUFFIXES > + qp_table_destroy(&s->qpd_table); > +#endif > qp_table_destroy(&s->qpp_table); > qp_table_destroy(&s->qpf_table); > v9fs_path_free(&path); > @@ -3831,6 +4057,9 @@ void v9fs_device_unrealize_common(V9fsState *s, Err= or **errp) > } > fsdev_throttle_cleanup(s->ctx.fst); > g_free(s->tag); > +#if P9_VARI_LENGTH_INODE_SUFFIXES > + qp_table_destroy(&s->qpd_table); > +#endif > qp_table_destroy(&s->qpp_table); > qp_table_destroy(&s->qpf_table); > g_free(s->ctx.fs_root); > diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h > index 2b74561030..a94272a504 100644 > --- a/hw/9pfs/9p.h > +++ b/hw/9pfs/9p.h > @@ -236,13 +236,68 @@ struct V9fsFidState > V9fsFidState *rclm_lst; > }; > =20 > -#define QPATH_INO_MASK (((unsigned long)1 << 48) - 1) > +/* > + * Defines how inode numbers from host shall be remapped on guest. > + * > + * When this compile time option is disabled then fixed length (16 bit) > + * prefix values are used for all inode numbers on guest level. Accordin= gly > + * guest's inode numbers will be quite large (>2^48). > + * > + * If this option is enabled then variable length suffixes will be used = for > + * guest's inode numbers instead which usually yields in much smaller in= ode > + * numbers on guest level (typically around >2^1 .. >2^7). > + */ > +#define P9_VARI_LENGTH_INODE_SUFFIXES 1 > + > +#if P9_VARI_LENGTH_INODE_SUFFIXES > + > +typedef enum AffixType_t { > + AffixType_Prefix, > + AffixType_Suffix, /* A.k.a. postfix. */ > +} AffixType_t; > + > +/** @brief Unique affix of variable length. > + * > + * An affix is (currently) either a suffix or a prefix, which is either > + * going to be prepended (prefix) or appended (suffix) with some other > + * number for the goal to generate unique numbers. Accordingly the > + * suffixes (or prefixes) we generate @b must all have the mathematical > + * property of being suffix-free (or prefix-free in case of prefixes) > + * so that no matter what number we concatenate the affix with, that we > + * always reliably get unique numbers as result after concatenation. > + */ > +typedef struct VariLenAffix { > + AffixType_t type; /* Whether this affix is a suffix or a prefix. */ > + uint64_t value; /* Actual numerical value of this affix. */ > + int bits; /* Lenght of the affix, that is how many (of the lowest) b= its of @c value must be used for appending/prepending this affix to its fin= al resulting, unique number. */ > +} VariLenAffix; > + > +#endif /* P9_VARI_LENGTH_INODE_SUFFIXES */ > + > +#if !P9_VARI_LENGTH_INODE_SUFFIXES > +# define QPATH_INO_MASK (((unsigned long)1 << 48) - 1) > +#endif > + > +#if P9_VARI_LENGTH_INODE_SUFFIXES > + > +/* See qid_inode_prefix_hash_bits(). */ > +typedef struct { > + dev_t dev; /* FS device on host. */ > + int prefix_bits; /* How many (high) bits of the original inode numbe= r shall be used for hashing. */ > +} QpdEntry; > + > +#endif /* P9_VARI_LENGTH_INODE_SUFFIXES */ > =20 > /* QID path prefix entry, see stat_to_qid */ > typedef struct { > dev_t dev; > uint16_t ino_prefix; > - uint16_t qp_prefix; > + #if P9_VARI_LENGTH_INODE_SUFFIXES > + uint32_t qp_affix_index; > + VariLenAffix qp_affix; > + #else > + uint16_t qp_affix; > + #endif > } QppEntry; > =20 > /* QID path full entry, as above */ > @@ -274,9 +329,15 @@ struct V9fsState > V9fsConf fsconf; > V9fsQID root_qid; > dev_t dev_id; > +#if P9_VARI_LENGTH_INODE_SUFFIXES > + struct qht qpd_table; > +#endif > struct qht qpp_table; > struct qht qpf_table; > - uint16_t qp_prefix_next; > +#if P9_VARI_LENGTH_INODE_SUFFIXES > + uint64_t qp_ndevices; /* Amount of entries in qpd_table. */ > +#endif > + uint16_t qp_affix_next; > uint64_t qp_fullpath_next; > }; > =20