From: Jarek Poplawski <jarkao2@o2.pl>
To: Radu Rendec <radu.rendec@ines.ro>
Cc: netdev@vger.kernel.org, jamal <hadi@cyberus.ca>
Subject: Re: Endianness problem with u32 classifier hash masks
Date: Fri, 02 Nov 2007 18:31:29 +0100 [thread overview]
Message-ID: <472B5EF1.4020206@o2.pl> (raw)
In-Reply-To: <1193939701.2987.82.camel@localhost.localdomain>
Radu Rendec wrote:
> Hi,
>
> While trying to implement u32 hashes in my shaping machine I ran into a
> possible bug in the u32 hash/bucket computing algorithm
> (net/sched/cls_u32.c).
>
> The problem occurs only with hash masks that extend over the octet
> boundary, on little endian machines (where htonl() actually does
> something).
>
> I'm not 100% sure this is a problem with u32 itself, but at least I'm
> sure u32 with the same configuration would behave differently on little
> endian and big endian machines. Detailed description of the problem and
> proposed patch follow.
I think you are right about this different behavior, so it looks like a bug.
And since little endian way is uncontrollable in such a case, your proposal
should be right.
But, since there is a maintainer for this, let's check what is he not payed
for?! (Cc: Jamal Hadi Salim)
Regards,
Jarek P.
>
> Let's say that I would like to use 0x3fc0 as the hash mask. This means 8
> contiguous "1" bits starting at b6. With such a mask, the expected (and
> logical) behavior is to hash any address in, for instance,
> 192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in
> bucket 1, then 192.168.0.128/26 in bucket 2 and so on.
>
> This is exactly what would happen on a big endian machine, but on little
> endian machines, what would actually happen with current implementation
> is 0x3fc0 being reversed (into 0xc03f0000) by htonl() in the userspace
> tool and then applied to 192.168.x.x in the u32 classifier. When
> shifting right by 16 bits (rank of first "1" bit in the reversed mask)
> and applying the divisor mask (0xff for divisor 256), what would
> actually remain is 0x3f applied on the "168" octet of the address.
>
> One could say is this can be easily worked around by taking endianness
> into account in userspace and supplying an appropriate mask (0xfc03)
> that would be turned into contiguous "1" bits when reversed
> (0x03fc0000). But the actual problem is the network address (inside the
> packet) not being converted to host order, but used as a host-order
> value when computing the bucket.
>
> Let's say the network address is written as n31 n30 ... n0, with n0
> being the least significant bit. When used directly (without any
> conversion) on a little endian machine, it becomes
> n7 ... n0 n8 ..n15 etc in the machine's registers. Thus bits n7 and n8
> would no longer be adjacent and 192.168.64.0/26 and 192.168.128.0/26
> would no longer be consecutive.
>
> My approach to this issue was keeping the hash mask in host order and
> converting the octets in the packet to host order before applying the
> mask. This proved to work just fine on my little endian machine, but I'm
> interested in finding out (from you) if this really is an issue with u32
> itself.
>
> My changes to the u32 classifier are attached below as a patch. It was
> made against 2.6.22.9, but applies cleanly on Dave Miller's net-2.6
> tree.
>
> The idea behind my changes is to keep the user space tool intact and
> work everything out in kernel space (because converting the packet
> octets to host order must be done in kernel anyway).
>
> Therefore, hash masks are converted back to host order when a selector
> is configured - in u32_change() - and converted to network order
> (because userspace tools expect to get them in network order from the
> kernel) when a selector is dumped - in u32_dump().
>
> I would like at least to know your opinion about this issue.
>
> Thanks,
>
> Radu Rendec
>
> --- linux-2.6.22.9/net/sched/cls_u32.c.orig 2007-10-30 17:08:03.000000000 +0200
> +++ linux-2.6.22.9/net/sched/cls_u32.c 2007-10-30 17:04:49.000000000 +0200
> @@ -198,7 +198,7 @@
> ht = n->ht_down;
> sel = 0;
> if (ht->divisor)
> - sel = ht->divisor&u32_hash_fold(*(u32*)(ptr+n->sel.hoff), &n->sel,n->fshift);
> + sel = ht->divisor&u32_hash_fold(ntohl(*(u32*)(ptr+n->sel.hoff)), &n->sel,n->fshift);
>
> if (!(n->sel.flags&(TC_U32_VAROFFSET|TC_U32_OFFSET|TC_U32_EAT)))
> goto next_ht;
> @@ -626,6 +626,10 @@
> }
> #endif
>
> + /* userspace tc tool sends us the hmask in network order, but we
> + * need host order, so change it here */
> + s->hmask = ntohl(s->hmask);
> +
> memcpy(&n->sel, s, sizeof(*s) + s->nkeys*sizeof(struct tc_u32_key));
> n->ht_up = ht;
> n->handle = handle;
> @@ -735,9 +739,14 @@
> u32 divisor = ht->divisor+1;
> RTA_PUT(skb, TCA_U32_DIVISOR, 4, &divisor);
> } else {
> + /* get the address where the selector will be put, then
> + * change the hmask after it is put there */
> + struct tc_u32_sel *s =
> + (struct tc_u32_sel *)RTA_DATA(skb_tail_pointer(skb));
> RTA_PUT(skb, TCA_U32_SEL,
> sizeof(n->sel) + n->sel.nkeys*sizeof(struct tc_u32_key),
> &n->sel);
> + s->hmask = htonl(s->hmask);
> if (n->ht_up) {
> u32 htid = n->handle & 0xFFFFF000;
> RTA_PUT(skb, TCA_U32_HASH, 4, &htid);
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2007-11-02 17:23 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-01 17:55 Endianness problem with u32 classifier hash masks Radu Rendec
2007-11-02 17:31 ` Jarek Poplawski [this message]
2007-11-02 23:23 ` jamal
2007-11-03 23:39 ` Jarek Poplawski
2007-11-03 23:58 ` Jarek Poplawski
2007-11-04 0:30 ` Jarek Poplawski
2007-11-04 1:17 ` Jarek Poplawski
2007-11-04 23:58 ` jamal
2007-11-05 9:12 ` Jarek Poplawski
2007-11-05 12:59 ` Radu Rendec
2007-11-05 13:43 ` jamal
2007-11-05 14:49 ` Jarek Poplawski
2007-11-05 16:12 ` Radu Rendec
2007-11-05 13:52 ` Jarek Poplawski
2007-11-05 14:06 ` jamal
2007-11-05 17:31 ` Radu Rendec
2007-11-05 21:06 ` Jarek Poplawski
2007-11-05 21:28 ` Jarek Poplawski
2007-11-05 22:27 ` jamal
2007-11-06 0:02 ` Jarek Poplawski
2007-11-06 0:12 ` Jarek Poplawski
2007-11-06 8:09 ` Radu Rendec
2007-11-06 13:34 ` jamal
2007-11-06 14:25 ` Jarek Poplawski
2007-11-06 14:43 ` jamal
2007-11-06 17:00 ` Radu Rendec
2007-11-06 20:28 ` Jarek Poplawski
2007-11-07 9:22 ` David Miller
2007-11-07 12:56 ` Jarek Poplawski
2007-11-07 13:42 ` jamal
2007-11-07 13:55 ` Radu Rendec
2007-11-07 14:35 ` Radu Rendec
2007-11-08 11:07 ` [PATCH] [PKT_SCHED] CLS_U32: Use ffs() instead of C code on hash mask to get first set bit Radu Rendec
2007-11-08 11:37 ` Jarek Poplawski
2007-11-08 13:45 ` jamal
2007-11-11 5:55 ` David Miller
2007-11-05 13:47 ` Endianness problem with u32 classifier hash masks jamal
2007-11-05 14:35 ` Jarek Poplawski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=472B5EF1.4020206@o2.pl \
--to=jarkao2@o2.pl \
--cc=hadi@cyberus.ca \
--cc=netdev@vger.kernel.org \
--cc=radu.rendec@ines.ro \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).