From: Jarek Poplawski <jarkao2@o2.pl>
To: Radu Rendec <radu.rendec@ines.ro>
Cc: netdev@vger.kernel.org, jamal <hadi@cyberus.ca>
Subject: Re: Endianness problem with u32 classifier hash masks
Date: Fri, 02 Nov 2007 18:31:29 +0100 [thread overview]
Message-ID: <472B5EF1.4020206@o2.pl> (raw)
In-Reply-To: <1193939701.2987.82.camel@localhost.localdomain>
Radu Rendec wrote:
> Hi,
>
> While trying to implement u32 hashes in my shaping machine I ran into a
> possible bug in the u32 hash/bucket computing algorithm
> (net/sched/cls_u32.c).
>
> The problem occurs only with hash masks that extend over the octet
> boundary, on little endian machines (where htonl() actually does
> something).
>
> I'm not 100% sure this is a problem with u32 itself, but at least I'm
> sure u32 with the same configuration would behave differently on little
> endian and big endian machines. Detailed description of the problem and
> proposed patch follow.
I think you are right about this different behavior, so it looks like a bug.
And since little endian way is uncontrollable in such a case, your proposal
should be right.
But, since there is a maintainer for this, let's check what is he not payed
for?! (Cc: Jamal Hadi Salim)
Regards,
Jarek P.
>
> Let's say that I would like to use 0x3fc0 as the hash mask. This means 8
> contiguous "1" bits starting at b6. With such a mask, the expected (and
> logical) behavior is to hash any address in, for instance,
> 192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in
> bucket 1, then 192.168.0.128/26 in bucket 2 and so on.
>
> This is exactly what would happen on a big endian machine, but on little
> endian machines, what would actually happen with current implementation
> is 0x3fc0 being reversed (into 0xc03f0000) by htonl() in the userspace
> tool and then applied to 192.168.x.x in the u32 classifier. When
> shifting right by 16 bits (rank of first "1" bit in the reversed mask)
> and applying the divisor mask (0xff for divisor 256), what would
> actually remain is 0x3f applied on the "168" octet of the address.
>
> One could say is this can be easily worked around by taking endianness
> into account in userspace and supplying an appropriate mask (0xfc03)
> that would be turned into contiguous "1" bits when reversed
> (0x03fc0000). But the actual problem is the network address (inside the
> packet) not being converted to host order, but used as a host-order
> value when computing the bucket.
>
> Let's say the network address is written as n31 n30 ... n0, with n0
> being the least significant bit. When used directly (without any
> conversion) on a little endian machine, it becomes
> n7 ... n0 n8 ..n15 etc in the machine's registers. Thus bits n7 and n8
> would no longer be adjacent and 192.168.64.0/26 and 192.168.128.0/26
> would no longer be consecutive.
>
> My approach to this issue was keeping the hash mask in host order and
> converting the octets in the packet to host order before applying the
> mask. This proved to work just fine on my little endian machine, but I'm
> interested in finding out (from you) if this really is an issue with u32
> itself.
>
> My changes to the u32 classifier are attached below as a patch. It was
> made against 2.6.22.9, but applies cleanly on Dave Miller's net-2.6
> tree.
>
> The idea behind my changes is to keep the user space tool intact and
> work everything out in kernel space (because converting the packet
> octets to host order must be done in kernel anyway).
>
> Therefore, hash masks are converted back to host order when a selector
> is configured - in u32_change() - and converted to network order
> (because userspace tools expect to get them in network order from the
> kernel) when a selector is dumped - in u32_dump().
>
> I would like at least to know your opinion about this issue.
>
> Thanks,
>
> Radu Rendec
>
> --- linux-2.6.22.9/net/sched/cls_u32.c.orig 2007-10-30 17:08:03.000000000 +0200
> +++ linux-2.6.22.9/net/sched/cls_u32.c 2007-10-30 17:04:49.000000000 +0200
> @@ -198,7 +198,7 @@
> ht = n->ht_down;
> sel = 0;
> if (ht->divisor)
> - sel = ht->divisor&u32_hash_fold(*(u32*)(ptr+n->sel.hoff), &n->sel,n->fshift);
> + sel = ht->divisor&u32_hash_fold(ntohl(*(u32*)(ptr+n->sel.hoff)), &n->sel,n->fshift);
>
> if (!(n->sel.flags&(TC_U32_VAROFFSET|TC_U32_OFFSET|TC_U32_EAT)))
> goto next_ht;
> @@ -626,6 +626,10 @@
> }
> #endif
>
> + /* userspace tc tool sends us the hmask in network order, but we
> + * need host order, so change it here */
> + s->hmask = ntohl(s->hmask);
> +
> memcpy(&n->sel, s, sizeof(*s) + s->nkeys*sizeof(struct tc_u32_key));
> n->ht_up = ht;
> n->handle = handle;
> @@ -735,9 +739,14 @@
> u32 divisor = ht->divisor+1;
> RTA_PUT(skb, TCA_U32_DIVISOR, 4, &divisor);
> } else {
> + /* get the address where the selector will be put, then
> + * change the hmask after it is put there */
> + struct tc_u32_sel *s =
> + (struct tc_u32_sel *)RTA_DATA(skb_tail_pointer(skb));
> RTA_PUT(skb, TCA_U32_SEL,
> sizeof(n->sel) + n->sel.nkeys*sizeof(struct tc_u32_key),
> &n->sel);
> + s->hmask = htonl(s->hmask);
> if (n->ht_up) {
> u32 htid = n->handle & 0xFFFFF000;
> RTA_PUT(skb, TCA_U32_HASH, 4, &htid);
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2007-11-02 17:23 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-01 17:55 Endianness problem with u32 classifier hash masks Radu Rendec
2007-11-02 17:31 ` Jarek Poplawski [this message]
2007-11-02 23:23 ` jamal
2007-11-03 23:39 ` Jarek Poplawski
2007-11-03 23:58 ` Jarek Poplawski
2007-11-04 0:30 ` Jarek Poplawski
2007-11-04 1:17 ` Jarek Poplawski
2007-11-04 23:58 ` jamal
2007-11-05 9:12 ` Jarek Poplawski
2007-11-05 12:59 ` Radu Rendec
2007-11-05 13:43 ` jamal
2007-11-05 14:49 ` Jarek Poplawski
2007-11-05 16:12 ` Radu Rendec
2007-11-05 13:52 ` Jarek Poplawski
2007-11-05 14:06 ` jamal
2007-11-05 17:31 ` Radu Rendec
2007-11-05 21:06 ` Jarek Poplawski
2007-11-05 21:28 ` Jarek Poplawski
2007-11-05 22:27 ` jamal
2007-11-06 0:02 ` Jarek Poplawski
2007-11-06 0:12 ` Jarek Poplawski
2007-11-06 8:09 ` Radu Rendec
2007-11-06 13:34 ` jamal
2007-11-06 14:25 ` Jarek Poplawski
2007-11-06 14:43 ` jamal
2007-11-06 17:00 ` Radu Rendec
2007-11-06 20:28 ` Jarek Poplawski
2007-11-07 9:22 ` David Miller
2007-11-07 12:56 ` Jarek Poplawski
2007-11-07 13:42 ` jamal
2007-11-07 13:55 ` Radu Rendec
2007-11-07 14:35 ` Radu Rendec
2007-11-08 11:07 ` [PATCH] [PKT_SCHED] CLS_U32: Use ffs() instead of C code on hash mask to get first set bit Radu Rendec
2007-11-08 11:37 ` Jarek Poplawski
2007-11-08 13:45 ` jamal
2007-11-11 5:55 ` David Miller
2007-11-05 13:47 ` Endianness problem with u32 classifier hash masks jamal
2007-11-05 14:35 ` Jarek Poplawski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=472B5EF1.4020206@o2.pl \
--to=jarkao2@o2.pl \
--cc=hadi@cyberus.ca \
--cc=netdev@vger.kernel.org \
--cc=radu.rendec@ines.ro \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.