qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Laurent Vivier <laurent@vivier.eu>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: Riku Voipio <riku.voipio@iki.fi>,
	QEMU Developers <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH] linux-user: manage SOCK_PACKET socket type.
Date: Tue, 27 Oct 2015 11:47:38 +0100	[thread overview]
Message-ID: <562F564A.5040902@vivier.eu> (raw)
In-Reply-To: <562EEAD4.7070706@vivier.eu>



Le 27/10/2015 04:09, Laurent Vivier a écrit :
> 
> 
> Le 26/10/2015 15:40, Peter Maydell a écrit :
>> On 6 October 2015 at 18:11, Laurent Vivier <laurent@vivier.eu> wrote:
>>> This is obsolete, but if we want to use dhcp with some distros (like debian
>>> ppc 8.2 jessie), we need it.
>>>
>>> At the bind level, we are not able to know the socket type so we try to
>>> guess it by analyzing the name. We manage only the case "ethX",
>>> "ethX" in spk_device is similar to set htons(0x6574) in sll_protocol in the
>>> normal case, and as this protocol does not exist, it's ok.
>>>
>>> SOCK_PACKET uses network endian to encode protocol in socket()
>>>
>>> in PACKET(7) :
>>>                                  protocol is the  IEEE  802.3  protocol
>>> number in network order.  See the <linux/if_ether.h> include file for a
>>> list of allowed protocols.  When protocol is  set  to  htons(ETH_P_ALL)
>>> then all protocols are received.  All incoming packets of that protocol
>>> type will be passed to the packet socket before they are passed to  the
>>> protocols implemented in the kernel.
>>>
>>> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
>>> ---
>>> This patch is a remix of an old patch sent in 2012:
>>> https://patchwork.ozlabs.org/patch/208892/
>>>
>>>  linux-user/syscall.c | 33 +++++++++++++++++++++++++++------
>>>  1 file changed, 27 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
>>> index 64be431..71cc1e2 100644
>>> --- a/linux-user/syscall.c
>>> +++ b/linux-user/syscall.c
>>> @@ -111,6 +111,7 @@ int __clone2(int (*fn)(void *), void *child_stack_base,
>>>  #include <linux/route.h>
>>>  #include <linux/filter.h>
>>>  #include <linux/blkpg.h>
>>> +#include <linux/if_packet.h>
>>>  #include "linux_loop.h"
>>>  #include "uname.h"
>>>
>>> @@ -1198,11 +1199,20 @@ static inline abi_long target_to_host_sockaddr(struct sockaddr *addr,
>>>      memcpy(addr, target_saddr, len);
>>>      addr->sa_family = sa_family;
>>>      if (sa_family == AF_PACKET) {
>>> -       struct target_sockaddr_ll *lladdr;
>>> +        /* Manage an obsolete case :
>>> +         * if socket type is SOCK_PACKET, bind by name otherwise by index
>>> +         * but we are not able to know socket type, so check if the name
>>> +         * is usable...
>>> +         * see linux/net/packet/af_packet.c: packet_bind_spkt()
>>> +         */
>>> +        if (strncmp((char *)((struct sockaddr_pkt *)addr)->spkt_device,
>>> +                    "eth", 3) != 0) {
>>> +            struct target_sockaddr_ll *lladdr;
>>
>> This confuses me. The packet(7) manpage suggests there are two flavours
>> of packet socket:
>>  (1) legacy AF_INET + SOCK_PACKET
>>  (2) new style AF_PACKET + SOCK_RAW / SOCK_DGRAM
>>
>> but this comment suggests it's trying to handle AF_PACKET + SOCK_PACKET ?
> 
> In fact, I've started not from the man page, but from a non working dhcp
> client, originally with a m68k target and etch-m68k distro, and I've met
> again this problem on a ppc target and jessie distro.
> 
>>
>> If AF_PACKET was introduced as the new way of doing things, it's not
>> clear why it would be the family type used in the legacy approach's
>> sockaddr_pkt (though there seems to be code around that does this).
>> I suspect that 2.0 kernels just didn't check af_family here at all.
> 
> by digging into the code, I have found:
> 
> $ apt-get source isc-dhcp-client
> $ vi isc-dhcp-4.3.1/common/lpf.c
> 
> ...
>  72 int if_register_lpf (info)
>  73         struct interface_info *info;
>  74 {
> ...
>  79         if ((sock = socket(PF_PACKET, SOCK_PACKET,
>  80                            htons((short)ETH_P_ALL))) < 0) {
> ...
> 
> So we can see socket() is used with domain PF_PACKET and type SOCK_PACKET.
> 
> Next, the interface name is put into the sa_data of sockaddr, and bind()
> is used with AF_PACKET:
> 
>  94         /* Bind to the interface name */
>  95         memset (&sa, 0, sizeof sa);
>  96         sa.sa_family = AF_PACKET;
>  97         strncpy (sa.sa_data, (const char *)info -> ifp, sizeof
> sa.sa_data);
>  98         if (bind (sock, &sa, sizeof sa)) {
> 
> ifp is initialized from a list of all discovered interfaces in
> common/discover.c:
> ...
>  238 int
>  239 begin_iface_scan(struct iface_conf_list *ifaces) {
> ...
>  283         if (ioctl(ifaces->sock, SIOCGLIFCONF, &ifaces->conf) < 0) {
> ...
>  918 void
>  919 discover_interfaces(int state) {
> ...
>  924         struct interface_info *tmp;
> ...
>  939         if (!begin_iface_scan(&ifaces)) {
> ...
>  955         while (next_iface(&info, &err, &ifaces)) {
>  956
>  957                 /* See if we've seen an interface that matches this
> one. */
>  958                 for (tmp = interfaces; tmp; tmp = tmp->next) {
>  959                         if (!strcmp(tmp->name, info.name))
>  960                                 break;
>  961                 }
> 
> ...
>  987                         strcpy(tmp->name, info.name);
> ...
> 1050         }
> ...
> 1063         for (tmp = interfaces ; tmp != NULL ; tmp = tmp->next) {
> 1064                 if (tmp->ifp == NULL) {
> 1065                         struct ifreq *tif;
> 1066
> 1067                         tif = (struct ifreq *)dmalloc(sizeof(struct
> ifreq),
> 1068                                                       MDL);
> 1069                         if (tif == NULL)
> 1070                                 log_fatal("no space for ifp mockup.");
> 1071                         strcpy(tif->ifr_name, tmp->name);
> 1072                         tmp->ifp = tif;
> 1073                 }
> 1074         }
> 
>>
>> The code in the kernel's packet_recvmsg fills in the spkt_family
>> field with the netdevice's type field, which is an ARPHRD_* constant
>> as far as I can tell (I could well be wrong here).
> 
> kernel 4.3.0-rc3, net/packet/af_packet.c:
> 
>    2961
>    2962 static int packet_bind_spkt(struct socket *sock, struct sockaddr
> *uaddr,
>    2963                             int addr_len)
>    2964 {
>    2965         struct sock *sk = sock->sk;
>    2966         char name[15];
>    2967         struct net_device *dev;
>    2968         int err = -ENODEV;
>    2969
>    2970         /*
>    2971          *      Check legality
>    2972          */
>    2973
>    2974         if (addr_len != sizeof(struct sockaddr))
>    2975                 return -EINVAL;
>    2976         strlcpy(name, uaddr->sa_data, sizeof(name));
>    2977
>    2978         dev = dev_get_by_name(sock_net(sk), name);
>    2979         if (dev)
>    2980                 err = packet_do_bind(sk, dev, pkt_sk(sk)->num);
>    2981         return err;
>    2982 }
> ...
>    4246 static const struct proto_ops packet_ops_spkt = {
>    4247         .family =       PF_PACKET,
> ...
>    4250         .bind =         packet_bind_spkt,
> ...
>    3022
>    3023 static int packet_create(struct net *net, struct socket *sock,
> int protocol,
>    3024                          int kern)
> ...
>    3045         if (sock->type == SOCK_PACKET)
>    3046                 sock->ops = &packet_ops_spkt;
> ...
> 
>> Odd to have code in target_to_host_sockaddr and not in
>> host_to_target_sockaddr.
> 
> In the case of host_to_target_sockaddr(), there is no "if (sa_family ==
> AF_PACKET) {" as it was in target_to_host_sockaddr(), it's why I didn't
> add it (and I don't like to add code I don't test).
> 
>>
>>> -       lladdr = (struct target_sockaddr_ll *)addr;
>>> -       lladdr->sll_ifindex = tswap32(lladdr->sll_ifindex);
>>> -       lladdr->sll_hatype = tswap16(lladdr->sll_hatype);
>>> +            lladdr = (struct target_sockaddr_ll *)addr;
>>> +            lladdr->sll_ifindex = tswap32(lladdr->sll_ifindex);
>>> +            lladdr->sll_hatype = tswap16(lladdr->sll_hatype);
>>> +        }
>>>      }
>>>      unlock_user(target_saddr, target_addr, 0);
>>>
>>> @@ -2509,7 +2519,12 @@ static abi_long do_socketcall(int num, abi_ulong vptr)
>>>      /* now when we have the args, actually handle the call */
>>>      switch (num) {
>>>      case SOCKOP_socket: /* domain, type, protocol */
>>> -        return do_socket(a[0], a[1], a[2]);
>>> +        if (a[0] == AF_PACKET ||
>>> +            a[1] == TARGET_SOCK_PACKET) {
>>> +            return do_socket(a[0], a[1], tswap16(a[2]));
>>> +        } else {
>>> +            return do_socket(a[0], a[1], a[2]);
>>> +        }
>>>      case SOCKOP_bind: /* sockfd, addr, addrlen */
>>>          return do_bind(a[0], a[1], a[2]);
>>>      case SOCKOP_connect: /* sockfd, addr, addrlen */
>>> @@ -7500,7 +7515,13 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
>>>  #endif
>>>  #ifdef TARGET_NR_socket
>>>      case TARGET_NR_socket:
>>> -        ret = do_socket(arg1, arg2, arg3);
>>> +        if (arg1 == AF_PACKET ||
>>> +            arg2 == TARGET_SOCK_PACKET) {
>>> +            /* in this case, socket() needs a network endian short */
>>> +            ret = do_socket(arg1, arg2, tswap16(arg3));
>>> +        } else {
>>> +            ret = do_socket(arg1, arg2, arg3);
>>> +        }
>>
>> This doesn't make sense to me. The argument to socket()
>> is passed in via a register; so if the guest code correctly
>> passes the protocol value as htons(whatever) then that will
>> be the value in arg3 and we do not need to swap anything.
>>
>> I see we had this discussion about the previous version of the
>> patch too... and my argument that we don't need the tswap16
>> in the socketcall code path either still makes sense to me.
> 
> I agree with you, I think I have confused socketcall() that passes
> parameters by memory, and socket() that passes parameters by registers.
> I will remove this part and resend a patch.

And for the socketcall part, we need the tswap16():

for instance,

    int a = htons(0x0003);

On a LE host:

    a = 0x00000300

On a BE host:

    a = 0x00000003

If the guest is BE, it will put in memory:

    0x00 0x00 0x00 0x03

Then a LE host, will read:

    int b = 0x03000000

but get_user_ual() in do_socketcall() will byte-swap it and put
0x00000003 in a[2].

so without the byte-swap, we call do_socket(..., 0x0003),
whereas the syscall is waiting for htons(0x0003) -> 0x0300 as we are on
LE host.

I'm sorry, I can't explain that better...

>>
>>>          fd_trans_unregister(ret);
>>>          break;
>>>  #endif
>>> --
>>> 2.4.3
>>
>> thanks
>> -- PMM
>>
> 
> Thank you for your comments,
> Laurent
> 

  reply	other threads:[~2015-10-27 10:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-06 17:11 [Qemu-devel] [PATCH] linux-user: manage SOCK_PACKET socket type Laurent Vivier
2015-10-26 14:40 ` Peter Maydell
2015-10-27  3:09   ` Laurent Vivier
2015-10-27 10:47     ` Laurent Vivier [this message]
2015-10-27 11:35       ` Peter Maydell
2015-10-27 11:39         ` Peter Maydell
2015-10-27 11:49           ` Laurent Vivier
2015-10-27 11:52             ` Peter Maydell
2015-10-27 11:56               ` Laurent Vivier
2015-10-27 11:54         ` Laurent Vivier
2015-10-27 11:50     ` Peter Maydell
2015-10-27 11:54       ` Laurent Vivier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=562F564A.5040902@vivier.eu \
    --to=laurent@vivier.eu \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=riku.voipio@iki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).