Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v4 1/1] can: add pruss CAN driver.
From: Wolfgang Grandegger @ 2011-05-04 16:09 UTC (permalink / raw)
  To: Arnd Bergmann, Subhasish Ghosh, linux-arm-kernel,
	Marc Kleine-Budde, sachi
In-Reply-To: <20110504155750.GC322@e-circ.dyndns.org>

Hi Kurt,

On 05/04/2011 05:57 PM, Kurt Van Dijck wrote:
>>
>>> How hard would it be to implement that feature in Socket CAN?
>>
>> CAN controllers usually provide some kind of hardware CAN id filtering,
>> but in a very hardware dependent way. A generic interface may be able to
>> handle the PRUSS restrictions as well. CAN devices are usually
>> configured through the netlink interface. e.g.
>>
>>   $ ip link set can0 up type can bitrate 125000
>>
>> and such a common interface would be netlink based as well.
> ack.
>>
>>> Is that something that Subhasish or someone else could to as a prerequisite
>>> to merging the driver?
>>
>> Any ideas on how to handle hardware filtering in a generic way are
>> welcome. I will try to come up with a proposal sooner than later.
> 
> When doing so, I'd vote for an unlimited(by software) list of hardware filters (id/mask).
> The hardware must abort when no more filters are available.

Sounds good and not even to complicated. For the SJA1000 we would just
allow to set the global mask.

> I think that when using hardware filters, knowing the actual device
> with it's amount of hardware filters is the least of your problems.
> Userspace applications that suddenly stop working properly due to
> hw filters (i.e. some traffic not coming in anymore) will be a major
> source of bugreports.

Well, hardware filtering will be off by default and must explicitly be
set by the user, like for the bitrate setting.

Wolfgang.


^ permalink raw reply

* Re: [PATCH V4 5/8]macvtap: macvtap TX zero-copy support
From: Michael S. Tsirkin @ 2011-05-04 16:12 UTC (permalink / raw)
  To: Shirley Ma
  Cc: David Miller, Eric Dumazet, Avi Kivity, Arnd Bergmann, netdev,
	kvm, linux-kernel
In-Reply-To: <1304523449.7076.30.camel@localhost.localdomain>

On Wed, May 04, 2011 at 08:37:29AM -0700, Shirley Ma wrote:
> On Wed, 2011-05-04 at 17:58 +0300, Michael S. Tsirkin wrote:
> > On Wed, May 04, 2011 at 01:14:53AM -0700, Shirley Ma wrote:
> > > Only when buffer size is greater than GOODCOPY_LEN (256), macvtap
> > > enables zero-copy.
> > > 
> > > Signed-off-by: Shirley Ma <xma@us.ibm.com>
> > 
> > 
> > Looks good. Some thoughts below.
> > 
> > > ---
> > > 
> > >  drivers/net/macvtap.c |  126
> > ++++++++++++++++++++++++++++++++++++++++++++----
> > >  1 files changed, 115 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> > > index 6696e56..e8bc5ff 100644
> > > --- a/drivers/net/macvtap.c
> > > +++ b/drivers/net/macvtap.c
> > > @@ -60,6 +60,7 @@ static struct proto macvtap_proto = {
> > >   */
> > >  static dev_t macvtap_major;
> > >  #define MACVTAP_NUM_DEVS 65536
> > > +#define GOODCOPY_LEN 256
> > 
> > Scope with MACVTAP_ please.
> Ok.
> 
> > For small packets, is it better to copy in vhost
> > and skip all the back and forth with callbacks? If yes, does
> > it make sense to put the constant above in some header
> > shared with vhost-net?
> 
> skb is created in macvtap, the small packet copy is in skb, so I don't
> think we can do it in vhost here.

One simple way is to pass NULL instead of the pend pointer
for when we want macvtap to copy unconditionally.
vhost-net will know that packet is copied and can notify user
unconditionally.

I think this will solve the small packet regression you see ... no?

> > >  static struct class *macvtap_class;
> > >  static struct cdev macvtap_cdev;
> > >  
> > > @@ -340,6 +341,7 @@ static int macvtap_open(struct inode *inode,
> > struct file *file)
> > >  {
> > >       struct net *net = current->nsproxy->net_ns;
> > >       struct net_device *dev = dev_get_by_index(net, iminor(inode));
> > > +     struct macvlan_dev *vlan = netdev_priv(dev);
> > >       struct macvtap_queue *q;
> > >       int err;
> > >  
> > > @@ -369,6 +371,16 @@ static int macvtap_open(struct inode *inode,
> > struct file *file)
> > >       q->flags = IFF_VNET_HDR | IFF_NO_PI | IFF_TAP;
> > >       q->vnet_hdr_sz = sizeof(struct virtio_net_hdr);
> > >  
> > > +     /*
> > > +      * so far only VM uses macvtap, enable zero copy between guest
> > > +      * kernel and host kernel when lower device supports high
> > memory
> > > +      * DMA
> > > +      */
> > > +     if (vlan) {
> > > +             if (vlan->lowerdev->features & NETIF_F_ZEROCOPY)
> > > +                     sock_set_flag(&q->sk, SOCK_ZEROCOPY);
> > > +     }
> > > +
> > >       err = macvtap_set_queue(dev, file, q);
> > >       if (err)
> > >               sock_put(&q->sk);
> > > @@ -433,6 +445,80 @@ static inline struct sk_buff
> > *macvtap_alloc_skb(struct sock *sk, size_t prepad,
> > >       return skb;
> > >  }
> > >  
> > > +/* set skb frags from iovec, this can move to core network code for
> > reuse */
> > > +static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct
> > iovec *from,
> > > +                               int offset, size_t count)
> > > +{
> > > +     int len = iov_length(from, count) - offset;
> > > +     int copy = skb_headlen(skb);
> > > +     int size, offset1 = 0;
> > > +     int i = 0;
> > > +     skb_frag_t *f;
> > > +
> > > +     /* Skip over from offset */
> > > +     while (offset >= from->iov_len) {
> > > +             offset -= from->iov_len;
> > > +             ++from;
> > > +             --count;
> > > +     }
> > > +
> > > +     /* copy up to skb headlen */
> > > +     while (copy > 0) {
> > > +             size = min_t(unsigned int, copy, from->iov_len -
> > offset);
> > > +             if (copy_from_user(skb->data + offset1, from->iov_base
> > + offset,
> > > +                                size))
> > > +                     return -EFAULT;
> > > +             if (copy > size) {
> > > +                     ++from;
> > > +                     --count;
> > > +             }
> > > +             copy -= size;
> > > +             offset1 += size;
> > > +             offset = 0;
> > > +     }
> > > +
> > > +     if (len == offset1)
> > > +             return 0;
> > > +
> > > +     while (count--) {
> > > +             struct page *page[MAX_SKB_FRAGS];
> > > +             int num_pages;
> > > +             unsigned long base;
> > > +
> > > +             len = from->iov_len - offset1;
> > > +             if (!len) {
> > > +                     offset1 = 0;
> > > +                     ++from;
> > > +                     continue;
> > > +             }
> > > +             base = (unsigned long)from->iov_base + offset1;
> > > +             size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >>
> > PAGE_SHIFT;
> > > +             num_pages = get_user_pages_fast(base, size, 0,
> > &page[i]);
> > > +             if ((num_pages != size) ||
> > > +                 (num_pages > MAX_SKB_FRAGS -
> > skb_shinfo(skb)->nr_frags))
> > > +                     /* put_page is in skb free */
> > > +                     return -EFAULT;
> > > +             skb->data_len += len;
> > > +             skb->len += len;
> > > +             skb->truesize += len;
> > > +             while (len) {
> > > +                     f = &skb_shinfo(skb)->frags[i];
> > > +                     f->page = page[i];
> > > +                     f->page_offset = base & ~PAGE_MASK;
> > > +                     f->size = min_t(int, len, PAGE_SIZE -
> > f->page_offset);
> > > +                     skb_shinfo(skb)->nr_frags++;
> > > +                     /* increase sk_wmem_alloc */
> > > +                     atomic_add(f->size, &skb->sk->sk_wmem_alloc);
> > 
> > One thing that gave me pause that we only accound for part of the page
> > here. I think we should count the whole page, no?
> 
> The whole page is pinned, but it might not be for this sock only. 

Right, but worst-case it is, so I think we should stay on the safe side
and limit what the user can do.

> I think I should change to atomic_add to outside the loop to save cost.

Sure, good idea.

> > > +                     base += f->size;
> > > +                     len -= f->size;
> > > +                     i++;
> > > +             }
> > > +             offset1 = 0;
> > > +             ++from;
> > > +     }
> > > +     return 0;
> > > +}
> > > +
> > >  /*
> > >   * macvtap_skb_from_vnet_hdr and macvtap_skb_to_vnet_hdr should
> > >   * be shared with the tun/tap driver.
> > > @@ -515,17 +601,19 @@ static int macvtap_skb_to_vnet_hdr(const
> > struct sk_buff *skb,
> > >  
> > >  
> > >  /* Get packet from user space buffer */
> > > -static ssize_t macvtap_get_user(struct macvtap_queue *q,
> > > -                             const struct iovec *iv, size_t count,
> > > -                             int noblock)
> > > +static ssize_t macvtap_get_user(struct macvtap_queue *q, struct
> > msghdr *m,
> > > +                             const struct iovec *iv, unsigned long
> > total_len,
> > > +                             size_t count, int noblock)
> > >  {
> > >       struct sk_buff *skb;
> > >       struct macvlan_dev *vlan;
> > > -     size_t len = count;
> > > +     unsigned long len = total_len;
> > >       int err;
> > >       struct virtio_net_hdr vnet_hdr = { 0 };
> > >       int vnet_hdr_len = 0;
> > > +     int copylen, zerocopy;
> > >  
> > > +     zerocopy = sock_flag(&q->sk, SOCK_ZEROCOPY) && (len >
> > GOODCOPY_LEN);
> > >       if (q->flags & IFF_VNET_HDR) {
> > >               vnet_hdr_len = q->vnet_hdr_sz;
> > >  
> > > @@ -552,12 +640,28 @@ static ssize_t macvtap_get_user(struct
> > macvtap_queue *q,
> > >       if (unlikely(len < ETH_HLEN))
> > >               goto err;
> > >  
> > > -     skb = macvtap_alloc_skb(&q->sk, NET_IP_ALIGN, len,
> > vnet_hdr.hdr_len,
> > > -                             noblock, &err);
> > > +     if (zerocopy)
> > > +             /* There are 256 bytes to be copied in skb, so there
> > is enough
> > > +              * room for skb expand head in case it is used.
> > > +              * The rest buffer is mapped from userspace.
> > > +              */
> > > +             copylen = GOODCOPY_LEN;
> > 
> > Just curious: where does the number 256 come from?
> > Also, as long as we are copying, should we care about
> > alignment?
> 
> 256 makes the size big enough for any skb head expanding.
> 
> That's my concern before.

I'm not sure I understand.
Could you tell me how do we know 256 is big enough for any skb head
expanding please?

> But guest should alignment the buffer already,
> after moving the pointer 256 bytes. It should still alignment, right?

I mean in the host. But whatever, it's not that important at this point.

> > > +     else
> > > +             copylen = len;
> > > +
> > > +     skb = macvtap_alloc_skb(&q->sk, NET_IP_ALIGN, copylen,
> > > +                             vnet_hdr.hdr_len, noblock, &err);
> > >       if (!skb)
> > >               goto err;
> > >  
> > > -     err = skb_copy_datagram_from_iovec(skb, 0, iv, vnet_hdr_len,
> > len);
> > > +     if (zerocopy)
> > > +             err = zerocopy_sg_from_iovec(skb, iv, vnet_hdr_len,
> > count);
> > > +     else
> > > +             err = skb_copy_datagram_from_iovec(skb, 0, iv,
> > vnet_hdr_len,
> > > +                                                len);
> > > +     if (sock_flag(&q->sk, SOCK_ZEROCOPY))
> > > +             memcpy(&skb_shinfo(skb)->ubuf, m->msg_control,
> > > +                     sizeof(struct skb_ubuf_info));
> > >       if (err)
> > >               goto err_kfree;
> > >  
> > > @@ -579,7 +683,7 @@ static ssize_t macvtap_get_user(struct
> > macvtap_queue *q,
> > >               kfree_skb(skb);
> > >       rcu_read_unlock_bh();
> > >  
> > > -     return count;
> > > +     return total_len;
> > >  
> > >  err_kfree:
> > >       kfree_skb(skb);
> > > @@ -601,8 +705,8 @@ static ssize_t macvtap_aio_write(struct kiocb
> > *iocb, const struct iovec *iv,
> > >       ssize_t result = -ENOLINK;
> > >       struct macvtap_queue *q = file->private_data;
> > >  
> > > -     result = macvtap_get_user(q, iv, iov_length(iv, count),
> > > -                           file->f_flags & O_NONBLOCK);
> > > +     result = macvtap_get_user(q, NULL, iv, iov_length(iv, count),
> > count,
> > > +                               file->f_flags & O_NONBLOCK);
> > >       return result;
> > >  }
> > >  
> > > @@ -815,7 +919,7 @@ static int macvtap_sendmsg(struct kiocb *iocb,
> > struct socket *sock,
> > >                          struct msghdr *m, size_t total_len)
> > >  {
> > >       struct macvtap_queue *q = container_of(sock, struct
> > macvtap_queue, sock);
> > > -     return macvtap_get_user(q, m->msg_iov, total_len,
> > > +     return macvtap_get_user(q, m, m->msg_iov, total_len,
> > m->msg_iovlen,
> > >                           m->msg_flags & MSG_DONTWAIT);
> > >  }
> > >  
> > > 
> > 
> > 

^ permalink raw reply

* Re: [PATCH V4 5/8]macvtap: macvtap TX zero-copy support
From: Michael S. Tsirkin @ 2011-05-04 16:14 UTC (permalink / raw)
  To: Shirley Ma
  Cc: David Miller, Eric Dumazet, Avi Kivity, Arnd Bergmann, netdev,
	kvm, linux-kernel
In-Reply-To: <1304523449.7076.30.camel@localhost.localdomain>

On Wed, May 04, 2011 at 08:37:29AM -0700, Shirley Ma wrote:
> On Wed, 2011-05-04 at 17:58 +0300, Michael S. Tsirkin wrote:
> > On Wed, May 04, 2011 at 01:14:53AM -0700, Shirley Ma wrote:
> > > Only when buffer size is greater than GOODCOPY_LEN (256), macvtap
> > > enables zero-copy.
> > > 
> > > Signed-off-by: Shirley Ma <xma@us.ibm.com>
> > 
> > 
> > Looks good. Some thoughts below.
> > 
> > > ---
> > > 
> > >  drivers/net/macvtap.c |  126
> > ++++++++++++++++++++++++++++++++++++++++++++----
> > >  1 files changed, 115 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> > > index 6696e56..e8bc5ff 100644
> > > --- a/drivers/net/macvtap.c
> > > +++ b/drivers/net/macvtap.c
> > > @@ -60,6 +60,7 @@ static struct proto macvtap_proto = {
> > >   */
> > >  static dev_t macvtap_major;
> > >  #define MACVTAP_NUM_DEVS 65536
> > > +#define GOODCOPY_LEN 256
> > 
> > Scope with MACVTAP_ please.
> Ok.
> 
> > For small packets, is it better to copy in vhost
> > and skip all the back and forth with callbacks? If yes, does
> > it make sense to put the constant above in some header
> > shared with vhost-net?
> 
> skb is created in macvtap, the small packet copy is in skb, so I don't
> think we can do it in vhost here.

BTW this is not very important, it might or might not
result in some speedup. Let's focus on getting it working
right.

-- 
MST

^ permalink raw reply

* Re: [PATCH V4 5/8]macvtap: macvtap TX zero-copy support
From: Shirley Ma @ 2011-05-04 16:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, Eric Dumazet, Avi Kivity, Arnd Bergmann, netdev,
	kvm, linux-kernel
In-Reply-To: <20110504161223.GC15648@redhat.com>

On Wed, 2011-05-04 at 19:12 +0300, Michael S. Tsirkin wrote:
> On Wed, May 04, 2011 at 08:37:29AM -0700, Shirley Ma wrote:
> > On Wed, 2011-05-04 at 17:58 +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 04, 2011 at 01:14:53AM -0700, Shirley Ma wrote:
> > > > Only when buffer size is greater than GOODCOPY_LEN (256),
> macvtap
> > > > enables zero-copy.
> > > > 
> > > > Signed-off-by: Shirley Ma <xma@us.ibm.com>
> > > 
> > > 
> > > Looks good. Some thoughts below.
> > > 
> > > > ---
> > > > 
> > > >  drivers/net/macvtap.c |  126
> > > ++++++++++++++++++++++++++++++++++++++++++++----
> > > >  1 files changed, 115 insertions(+), 11 deletions(-)
> > > > 
> > > > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> > > > index 6696e56..e8bc5ff 100644
> > > > --- a/drivers/net/macvtap.c
> > > > +++ b/drivers/net/macvtap.c
> > > > @@ -60,6 +60,7 @@ static struct proto macvtap_proto = {
> > > >   */
> > > >  static dev_t macvtap_major;
> > > >  #define MACVTAP_NUM_DEVS 65536
> > > > +#define GOODCOPY_LEN 256
> > > 
> > > Scope with MACVTAP_ please.
> > Ok.
> > 
> > > For small packets, is it better to copy in vhost
> > > and skip all the back and forth with callbacks? If yes, does
> > > it make sense to put the constant above in some header
> > > shared with vhost-net?
> > 
> > skb is created in macvtap, the small packet copy is in skb, so I
> don't
> > think we can do it in vhost here.
> 
> One simple way is to pass NULL instead of the pend pointer
> for when we want macvtap to copy unconditionally.
> vhost-net will know that packet is copied and can notify user
> unconditionally.
> 
> I think this will solve the small packet regression you see ... no?

I can certainly test it out. The small packet regression seems from more
guest exits. This patch doesn't do anything to cause more guest exits,
but it did speed up sendmsg() path.

> > > >  static struct class *macvtap_class;
> > > >  static struct cdev macvtap_cdev;
> > > >  
> > > > @@ -340,6 +341,7 @@ static int macvtap_open(struct inode *inode,
> > > struct file *file)
> > > >  {
> > > >       struct net *net = current->nsproxy->net_ns;
> > > >       struct net_device *dev = dev_get_by_index(net,
> iminor(inode));
> > > > +     struct macvlan_dev *vlan = netdev_priv(dev);
> > > >       struct macvtap_queue *q;
> > > >       int err;
> > > >  
> > > > @@ -369,6 +371,16 @@ static int macvtap_open(struct inode
> *inode,
> > > struct file *file)
> > > >       q->flags = IFF_VNET_HDR | IFF_NO_PI | IFF_TAP;
> > > >       q->vnet_hdr_sz = sizeof(struct virtio_net_hdr);
> > > >  
> > > > +     /*
> > > > +      * so far only VM uses macvtap, enable zero copy between
> guest
> > > > +      * kernel and host kernel when lower device supports high
> > > memory
> > > > +      * DMA
> > > > +      */
> > > > +     if (vlan) {
> > > > +             if (vlan->lowerdev->features & NETIF_F_ZEROCOPY)
> > > > +                     sock_set_flag(&q->sk, SOCK_ZEROCOPY);
> > > > +     }
> > > > +
> > > >       err = macvtap_set_queue(dev, file, q);
> > > >       if (err)
> > > >               sock_put(&q->sk);
> > > > @@ -433,6 +445,80 @@ static inline struct sk_buff
> > > *macvtap_alloc_skb(struct sock *sk, size_t prepad,
> > > >       return skb;
> > > >  }
> > > >  
> > > > +/* set skb frags from iovec, this can move to core network code
> for
> > > reuse */
> > > > +static int zerocopy_sg_from_iovec(struct sk_buff *skb, const
> struct
> > > iovec *from,
> > > > +                               int offset, size_t count)
> > > > +{
> > > > +     int len = iov_length(from, count) - offset;
> > > > +     int copy = skb_headlen(skb);
> > > > +     int size, offset1 = 0;
> > > > +     int i = 0;
> > > > +     skb_frag_t *f;
> > > > +
> > > > +     /* Skip over from offset */
> > > > +     while (offset >= from->iov_len) {
> > > > +             offset -= from->iov_len;
> > > > +             ++from;
> > > > +             --count;
> > > > +     }
> > > > +
> > > > +     /* copy up to skb headlen */
> > > > +     while (copy > 0) {
> > > > +             size = min_t(unsigned int, copy, from->iov_len -
> > > offset);
> > > > +             if (copy_from_user(skb->data + offset1,
> from->iov_base
> > > + offset,
> > > > +                                size))
> > > > +                     return -EFAULT;
> > > > +             if (copy > size) {
> > > > +                     ++from;
> > > > +                     --count;
> > > > +             }
> > > > +             copy -= size;
> > > > +             offset1 += size;
> > > > +             offset = 0;
> > > > +     }
> > > > +
> > > > +     if (len == offset1)
> > > > +             return 0;
> > > > +
> > > > +     while (count--) {
> > > > +             struct page *page[MAX_SKB_FRAGS];
> > > > +             int num_pages;
> > > > +             unsigned long base;
> > > > +
> > > > +             len = from->iov_len - offset1;
> > > > +             if (!len) {
> > > > +                     offset1 = 0;
> > > > +                     ++from;
> > > > +                     continue;
> > > > +             }
> > > > +             base = (unsigned long)from->iov_base + offset1;
> > > > +             size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >>
> > > PAGE_SHIFT;
> > > > +             num_pages = get_user_pages_fast(base, size, 0,
> > > &page[i]);
> > > > +             if ((num_pages != size) ||
> > > > +                 (num_pages > MAX_SKB_FRAGS -
> > > skb_shinfo(skb)->nr_frags))
> > > > +                     /* put_page is in skb free */
> > > > +                     return -EFAULT;
> > > > +             skb->data_len += len;
> > > > +             skb->len += len;
> > > > +             skb->truesize += len;
> > > > +             while (len) {
> > > > +                     f = &skb_shinfo(skb)->frags[i];
> > > > +                     f->page = page[i];
> > > > +                     f->page_offset = base & ~PAGE_MASK;
> > > > +                     f->size = min_t(int, len, PAGE_SIZE -
> > > f->page_offset);
> > > > +                     skb_shinfo(skb)->nr_frags++;
> > > > +                     /* increase sk_wmem_alloc */
> > > > +                     atomic_add(f->size,
> &skb->sk->sk_wmem_alloc);
> > > 
> > > One thing that gave me pause that we only accound for part of the
> page
> > > here. I think we should count the whole page, no?
> > 
> > The whole page is pinned, but it might not be for this sock only. 
> 
> Right, but worst-case it is, so I think we should stay on the safe
> side
> and limit what the user can do.
Ok, let me try this out.

> > I think I should change to atomic_add to outside the loop to save
> cost.
> 
> Sure, good idea.
> 
> > > > +                     base += f->size;
> > > > +                     len -= f->size;
> > > > +                     i++;
> > > > +             }
> > > > +             offset1 = 0;
> > > > +             ++from;
> > > > +     }
> > > > +     return 0;
> > > > +}
> > > > +
> > > >  /*
> > > >   * macvtap_skb_from_vnet_hdr and macvtap_skb_to_vnet_hdr should
> > > >   * be shared with the tun/tap driver.
> > > > @@ -515,17 +601,19 @@ static int macvtap_skb_to_vnet_hdr(const
> > > struct sk_buff *skb,
> > > >  
> > > >  
> > > >  /* Get packet from user space buffer */
> > > > -static ssize_t macvtap_get_user(struct macvtap_queue *q,
> > > > -                             const struct iovec *iv, size_t
> count,
> > > > -                             int noblock)
> > > > +static ssize_t macvtap_get_user(struct macvtap_queue *q, struct
> > > msghdr *m,
> > > > +                             const struct iovec *iv, unsigned
> long
> > > total_len,
> > > > +                             size_t count, int noblock)
> > > >  {
> > > >       struct sk_buff *skb;
> > > >       struct macvlan_dev *vlan;
> > > > -     size_t len = count;
> > > > +     unsigned long len = total_len;
> > > >       int err;
> > > >       struct virtio_net_hdr vnet_hdr = { 0 };
> > > >       int vnet_hdr_len = 0;
> > > > +     int copylen, zerocopy;
> > > >  
> > > > +     zerocopy = sock_flag(&q->sk, SOCK_ZEROCOPY) && (len >
> > > GOODCOPY_LEN);
> > > >       if (q->flags & IFF_VNET_HDR) {
> > > >               vnet_hdr_len = q->vnet_hdr_sz;
> > > >  
> > > > @@ -552,12 +640,28 @@ static ssize_t macvtap_get_user(struct
> > > macvtap_queue *q,
> > > >       if (unlikely(len < ETH_HLEN))
> > > >               goto err;
> > > >  
> > > > -     skb = macvtap_alloc_skb(&q->sk, NET_IP_ALIGN, len,
> > > vnet_hdr.hdr_len,
> > > > -                             noblock, &err);
> > > > +     if (zerocopy)
> > > > +             /* There are 256 bytes to be copied in skb, so
> there
> > > is enough
> > > > +              * room for skb expand head in case it is used.
> > > > +              * The rest buffer is mapped from userspace.
> > > > +              */
> > > > +             copylen = GOODCOPY_LEN;
> > > 
> > > Just curious: where does the number 256 come from?
> > > Also, as long as we are copying, should we care about
> > > alignment?
> > 
> > 256 makes the size big enough for any skb head expanding.
> > 
> > That's my concern before.
> 
> I'm not sure I understand.
> Could you tell me how do we know 256 is big enough for any skb head
> expanding please?
I meant to make sure in any case all headers will be in skb, not in
frags.

> > But guest should alignment the buffer already,
> > after moving the pointer 256 bytes. It should still alignment,
> right?
> 
> I mean in the host. But whatever, it's not that important at this
> point.
> 
> > > > +     else
> > > > +             copylen = len;
> > > > +
> > > > +     skb = macvtap_alloc_skb(&q->sk, NET_IP_ALIGN, copylen,
> > > > +                             vnet_hdr.hdr_len, noblock, &err);
> > > >       if (!skb)
> > > >               goto err;
> > > >  
> > > > -     err = skb_copy_datagram_from_iovec(skb, 0, iv,
> vnet_hdr_len,
> > > len);
> > > > +     if (zerocopy)
> > > > +             err = zerocopy_sg_from_iovec(skb, iv,
> vnet_hdr_len,
> > > count);
> > > > +     else
> > > > +             err = skb_copy_datagram_from_iovec(skb, 0, iv,
> > > vnet_hdr_len,
> > > > +                                                len);
> > > > +     if (sock_flag(&q->sk, SOCK_ZEROCOPY))
> > > > +             memcpy(&skb_shinfo(skb)->ubuf, m->msg_control,
> > > > +                     sizeof(struct skb_ubuf_info));
> > > >       if (err)
> > > >               goto err_kfree;
> > > >  
> > > > @@ -579,7 +683,7 @@ static ssize_t macvtap_get_user(struct
> > > macvtap_queue *q,
> > > >               kfree_skb(skb);
> > > >       rcu_read_unlock_bh();
> > > >  
> > > > -     return count;
> > > > +     return total_len;
> > > >  
> > > >  err_kfree:
> > > >       kfree_skb(skb);
> > > > @@ -601,8 +705,8 @@ static ssize_t macvtap_aio_write(struct
> kiocb
> > > *iocb, const struct iovec *iv,
> > > >       ssize_t result = -ENOLINK;
> > > >       struct macvtap_queue *q = file->private_data;
> > > >  
> > > > -     result = macvtap_get_user(q, iv, iov_length(iv, count),
> > > > -                           file->f_flags & O_NONBLOCK);
> > > > +     result = macvtap_get_user(q, NULL, iv, iov_length(iv,
> count),
> > > count,
> > > > +                               file->f_flags & O_NONBLOCK);
> > > >       return result;
> > > >  }
> > > >  
> > > > @@ -815,7 +919,7 @@ static int macvtap_sendmsg(struct kiocb
> *iocb,
> > > struct socket *sock,
> > > >                          struct msghdr *m, size_t total_len)
> > > >  {
> > > >       struct macvtap_queue *q = container_of(sock, struct
> > > macvtap_queue, sock);
> > > > -     return macvtap_get_user(q, m->msg_iov, total_len,
> > > > +     return macvtap_get_user(q, m, m->msg_iov, total_len,
> > > m->msg_iovlen,
> > > >                           m->msg_flags & MSG_DONTWAIT);
> > > >  }
> > > >  
> > > > 
> > > 
> > > 

^ permalink raw reply

* Re: [PATCH] mlx4_en: Setting RSS hash result to skb->rxhash field
From: Eric Dumazet @ 2011-05-04 16:46 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Yevgeny Petrilin, davem, netdev
In-Reply-To: <1304520260.3203.34.camel@localhost>

Le mercredi 04 mai 2011 à 15:44 +0100, Ben Hutchings a écrit :
> On Wed, 2011-05-04 at 16:37 +0300, Yevgeny Petrilin wrote:
> > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> > ---
> >  drivers/net/mlx4/en_rx.c |    3 +++
> >  1 files changed, 3 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/net/mlx4/en_rx.c b/drivers/net/mlx4/en_rx.c
> > index 62dd21b..bb4d66a 100644
> > --- a/drivers/net/mlx4/en_rx.c
> > +++ b/drivers/net/mlx4/en_rx.c
> > @@ -610,6 +610,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
> >  					gro_skb->data_len = length;
> >  					gro_skb->truesize += length;
> >  					gro_skb->ip_summed = CHECKSUM_UNNECESSARY;
> > +					gro_skb->rxhash = be32_to_cpu(cqe->immed_rss_invalid) << 24;
> > +					skb_record_rx_queue(gro_skb, cq->ring);
> 
> An 8-bit hash is almost useless.  It's entirely useless if you then
> shift it into the top bits of rxhash.
> 

Agreed. This is very bad.

Yevgeny probably did this shift because get_rps_cpu()
does :

tcpu = map->cpus[((u64) skb->rxhash * map->len) >> 32];

(If rxhash is not a pure random u32 distribution, then high order bits
are more important than low order bits)




^ permalink raw reply

* [PATCH -next] x86/net: only select BPF_JIT when NET is enabled
From: Randy Dunlap @ 2011-05-04 16:56 UTC (permalink / raw)
  To: Stephen Rothwell, netdev; +Cc: linux-next, LKML, davem
In-Reply-To: <20110504144759.802483cc.sfr@canb.auug.org.au>

From: Randy Dunlap <randy.dunlap@oracle.com>

Fix kconfig unmet dependency warning: HAVE_BPF_JIT depends on NET, so
make the "select" of it depend on NET also.

warning: (X86) selects HAVE_BPF_JIT which has unmet direct dependencies (NET)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
---
 arch/x86/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-next-20110503.orig/arch/x86/Kconfig
+++ linux-next-20110503/arch/x86/Kconfig
@@ -71,7 +71,7 @@ config X86
 	select GENERIC_IRQ_SHOW
 	select IRQ_FORCED_THREADING
 	select USE_GENERIC_SMP_HELPERS if SMP
-	select HAVE_BPF_JIT if X86_64
+	select HAVE_BPF_JIT if (X86_64 && NET)
 
 config INSTRUCTION_DECODER
 	def_bool (KPROBES || PERF_EVENTS)

^ permalink raw reply

* Re: 2.6.38.2, kernel panic, probably related to framentation handling
From: Eric Dumazet @ 2011-05-04 17:03 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netdev
In-Reply-To: <f2c5863250c7a5fb11a189759bd736cf@visp.net.lb>

Le mercredi 04 mai 2011 à 14:36 +0300, Denys Fedoryshchenko a écrit :
> Seems once more, during trying to bring another type of tunnel (this 
>  time userspace, working over tun device) and switching routes got one 
>  more kernel panic
>  It is vanilla kernel, but many source routing rules, firewall, QoS and 
>  etc, including this tunnel now also. Here is what i got on netconsole:
>  Any other info required?
> 
>  netc [1192230.881002]
>  netc [1192230.881002] Pid: 0, comm: kworker/0:1 Not tainted 
>  2.6.38.2-devel2 #2
>  netc
>  netc Dell Inc. PowerEdge 1950
>  netc /
>  netc 0D8635
>  netc
>  netc [1192230.881002] EIP: 0060:[<c03c0847>] EFLAGS: 00010206 CPU: 3
>  netc [1192230.881002] EIP is at icmp_send+0x39/0x396
>  netc [1192230.881002] EAX: 121a8aca EBX: d1d28600 ECX: 00000001 EDX: 
>  c63b6600
>  netc [1192230.881002] ESI: d1d28600 EDI: c33438a0 EBP: f2a41840 ESP: 
>  f64b5e8c
>  netc [1192230.881002]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>  netc [1192230.881002] Process kworker/0:1 (pid: 0, ti=f64b4000 
>  task=f64a4a80 task.ti=f64b0000)
>  netc [1192230.881002] Stack:
>  netc [1192230.881002]  c0113ea1
>  netc 00000001
>  netc 0000000b
>  netc 00000000
>  netc efed48c7
>  netc 0000114a
>  netc 00000000
>  netc f6b01fa8
>  netc
>  netc [1192230.881002]  e21be5e1
>  netc c0148bf3
>  netc e217d5d0
>  netc 00043c53
>  netc 00000001
>  netc f6b02d14
>  netc e21be5e1
>  netc 00043c53
>  netc
>  netc [1192230.881002]  e21be5e1
>  netc 00043c53
>  netc c0148d2e
>  netc 00000000
>  netc 00000058
>  netc 00000000
>  netc c0140779
>  netc ce9f5aa9
>  netc
>  netc [1192230.881002] Call Trace:
>  netc [1192230.881002]  [<c0113ea1>] ? lapic_next_event+0x13/0x16
>  netc [1192230.881002]  [<c0148bf3>] ? tick_dev_program_event+0x26/0x116
>  netc [1192230.881002]  [<c0148d2e>] ? tick_program_event+0x1b/0x1f
>  netc [1192230.881002]  [<c0140779>] ? hrtimer_interrupt+0x10c/0x1ca
>  netc [1192230.881002]  [<c0140e49>] ? hrtimer_start+0x20/0x25
>  netc [1192230.881002]  [<c012f18e>] ? irq_exit+0x36/0x59
>  netc [1192230.881002]  [<c0114933>] ? 
>  smp_apic_timer_interrupt+0x71/0x7d
>  netc [1192230.881002]  [<c03f2752>] ? apic_timer_interrupt+0x2a/0x30
>  netc [1192230.881002]  [<c039f527>] ? ip_expire+0xf2/0x11b
>  netc [1192230.881002]  [<c039f435>] ? ip_expire+0x0/0x11b
>  netc [1192230.881002]  [<c0133421>] ? run_timer_softirq+0x140/0x1c7
>  netc [1192230.881002]  [<c012f28f>] ? __do_softirq+0x6b/0x104
>  netc [1192230.881002]  [<c012f224>] ? __do_softirq+0x0/0x104
>  netc [1192230.881002]  [<c012f224>] ? __do_softirq+0x0/0x104
>  netc [1192230.881002]  <IRQ>
>  netc
>  netc [1192230.881002]  [<c012f17e>] ? irq_exit+0x26/0x59
>  netc [1192230.881002]  [<c0103b3d>] ? do_IRQ+0x81/0x95
>  netc [1192230.881002]  [<c0114933>] ? 
>  smp_apic_timer_interrupt+0x71/0x7d
>  netc [1192230.881002]  [<c0102ca9>] ? common_interrupt+0x29/0x30
>  netc [1192230.881002]  [<c010807a>] ? mwait_idle+0x51/0x56
>  netc [1192230.881002]  [<c0101a97>] ? cpu_idle+0x41/0x5e
>  netc [1192230.881002] Code:
>  netc 08
>  netc 89
>  netc c6
>  netc 89
>  netc 4c
>  netc 24
>  netc 04
>  netc 8b
>  netc 40
>  netc 48
>  netc 89                                                                 
>                        netc c2
>  netc 83
>  netc e2
>  netc fe
>  netc 0f
>  netc 84
>  netc 66
>  netc 03
>  netc 00
>  netc 00
>  netc 89
>  netc 94
>  netc 24
>  netc c0
>  netc 00
>  netc 00
>  netc 00
>  netc 8b
>  netc 42
>  netc 0c
>  netc 8b
>  netc be
>  netc 94
>  netc 00
>  netc 00
>  netc 00
>  netc 3b
>  netc be
>  netc a4
>  netc 00
>  netc 00
>  netc 00
>  May  4 11:17:12 217.151.224.119 unparseable log message: "<8b> "
>  netc 80
>  netc 80
>  netc 02
>  netc 00
>  netc 00
>  netc 89
>  netc 44
>  netc 24
>  netc 10
>  netc 0f
>  netc 82
>  netc 40
>  netc 03
>  netc 00
>  netc 00
>  netc 8d
>  netc 47
>  netc 14
>  netc 39
>  netc 86
>  netc
>  netc [1192230.881002] EIP: [<c03c0847>]
>  netc icmp_send+0x39/0x396
>  netc SS:ESP 0068:f64b5e8c
>  netc [1192230.881002] CR2: 00000000121a8d4a
>  netc [1192230.910072] ---[ end trace 42aae79d7fb08725 ]---
>  netc [1192230.910354] Kernel panic - not syncing: Fatal exception in 
>  interrupt
>  netc [1192230.911062] Rebooting in 5 seconds..
> 

Hi Denys

Is it reproductible, and possibly on latest kernel ?

We fixed some bugs lately (assuming you also use a bridge ?)

Could you send the disassembled code on your kernel of icmp_send() ?

Thanks !



^ permalink raw reply

* Re: [ethtool PATCH 4/4] v5 Add RX packet classification interface
From: Dimitris Michailidis @ 2011-05-04 17:09 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Alexander Duyck, davem, jeffrey.t.kirsher, netdev
In-Reply-To: <1304465684.2873.26.camel@bwh-desktop>

On 05/03/2011 04:34 PM, Ben Hutchings wrote:
> On Tue, 2011-05-03 at 16:23 -0700, Dimitris Michailidis wrote:
>> I think RX_CLS_LOC_UNSPEC should be passed to the driver, where there is 
>> enough knowledge to pick an appropriate slot.  So I'd remove the
>>
>>      if (loc == RX_CLS_LOC_UNSPEC)
>>
>> block above, let the driver pick a slot, and then pass the selected location 
>> back for ethtool to report.
> 
> But first we have to specify this in the ethtool API.  So please propose
> a patch to ethtool.h.

In the past we discussed that being able to specify the first available slot or 
the last available would be useful, so something like the below?

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 4194a20..909ef79 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -442,7 +442,8 @@ struct ethtool_flow_ext {
   *	includes the %FLOW_EXT flag.
   * @ring_cookie: RX ring/queue index to deliver to, or %RX_CLS_FLOW_DISC
   *	if packets should be discarded
- * @location: Index of filter in hardware table
+ * @location: Index of filter in hardware table, or %RX_CLS_FLOW_FIRST_LOC for
+ *	first available index, or %RX_CLS_FLOW_LAST_LOC for last available
   */
  struct ethtool_rx_flow_spec {
  	__u32		flow_type;
@@ -1142,6 +1143,10 @@ struct ethtool_ops {

  #define	RX_CLS_FLOW_DISC	0xffffffffffffffffULL

+/* special values for ethtool_rx_flow_spec.location */
+#define RX_CLS_FLOW_FIRST_LOC	0xffffffffU;
+#define RX_CLS_FLOW_LAST_LOC	0xfffffffeU;
+
  /* Reset flags */
  /* The reset() operation must clear the flags for the components which
   * were actually reset.  On successful return, the flags indicate the

^ permalink raw reply related

* [PATCH 0/8] More genericization of struct rtable
From: David Miller @ 2011-05-04 17:14 UTC (permalink / raw)
  To: netdev


This patch set does more work to minimize or completely eliminate
various fields of struct rtable, with the goal of making the
structure as generic and non-specific as possible.

First we reduce rt->rt_tos such that it is only used as a key
for routing cache lookups, and rename it rt->rt_key_tos to
reflect this fact.

Next some output route lookup interface tweaks leading to the
removal of several rt->rt_{dst,src} accesses, replacing those
with flowi4->{daddr,saddr}

More to come.

^ permalink raw reply

* [PATCH 1/8] ipv4: Rework ipmr_rt_fib_lookup() flow key initialization.
From: David Miller @ 2011-05-04 17:14 UTC (permalink / raw)
  To: netdev


Use information from the skb as much as possible, currently
this means daddr, saddr, and TOS.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/ipmr.c |   16 +++++++++-------
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index c81b9b6..3ad38a4 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1788,12 +1788,14 @@ dont_forward:
 	return 0;
 }
 
-static struct mr_table *ipmr_rt_fib_lookup(struct net *net, struct rtable *rt)
+static struct mr_table *ipmr_rt_fib_lookup(struct net *net, struct sk_buff *skb)
 {
+	struct rtable *rt = skb_rtable(skb);
+	struct iphdr *iph = ip_hdr(skb);
 	struct flowi4 fl4 = {
-		.daddr = rt->rt_key_dst,
-		.saddr = rt->rt_key_src,
-		.flowi4_tos = rt->rt_tos,
+		.daddr = iph->daddr,
+		.saddr = iph->saddr,
+		.flowi4_tos = iph->tos,
 		.flowi4_oif = rt->rt_oif,
 		.flowi4_iif = rt->rt_iif,
 		.flowi4_mark = rt->rt_mark,
@@ -1825,7 +1827,7 @@ int ip_mr_input(struct sk_buff *skb)
 	if (IPCB(skb)->flags & IPSKB_FORWARDED)
 		goto dont_forward;
 
-	mrt = ipmr_rt_fib_lookup(net, skb_rtable(skb));
+	mrt = ipmr_rt_fib_lookup(net, skb);
 	if (IS_ERR(mrt)) {
 		kfree_skb(skb);
 		return PTR_ERR(mrt);
@@ -1957,7 +1959,7 @@ int pim_rcv_v1(struct sk_buff *skb)
 
 	pim = igmp_hdr(skb);
 
-	mrt = ipmr_rt_fib_lookup(net, skb_rtable(skb));
+	mrt = ipmr_rt_fib_lookup(net, skb);
 	if (IS_ERR(mrt))
 		goto drop;
 	if (!mrt->mroute_do_pim ||
@@ -1989,7 +1991,7 @@ static int pim_rcv(struct sk_buff *skb)
 	     csum_fold(skb_checksum(skb, 0, skb->len, 0))))
 		goto drop;
 
-	mrt = ipmr_rt_fib_lookup(net, skb_rtable(skb));
+	mrt = ipmr_rt_fib_lookup(net, skb);
 	if (IS_ERR(mrt))
 		goto drop;
 	if (__pim_rcv(mrt, skb, sizeof(*pim))) {
-- 
1.7.4.5


^ permalink raw reply related

* [PATCH 2/8] ipv4: Renamt struct rtable's rt_tos to rt_key_tos.
From: David Miller @ 2011-05-04 17:14 UTC (permalink / raw)
  To: netdev


To more accurately reflect that it is purely a routing
cache lookup key and is used in no other context.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/route.h     |    2 +-
 net/ipv4/route.c        |   24 ++++++++++++------------
 net/ipv4/xfrm4_policy.c |    2 +-
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 16eb59c..f07609e 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -52,7 +52,7 @@ struct rtable {
 	int			rt_genid;
 	unsigned		rt_flags;
 	__u16			rt_type;
-	__u8			rt_tos;
+	__u8			rt_key_tos;
 
 	__be32			rt_dst;	/* Path destination	*/
 	__be32			rt_src;	/* Path source		*/
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 64f360d..3bc6854 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -424,7 +424,7 @@ static int rt_cache_seq_show(struct seq_file *seq, void *v)
 			dst_metric(&r->dst, RTAX_WINDOW),
 			(int)((dst_metric(&r->dst, RTAX_RTT) >> 3) +
 			      dst_metric(&r->dst, RTAX_RTTVAR)),
-			r->rt_tos,
+			r->rt_key_tos,
 			r->dst.hh ? atomic_read(&r->dst.hh->hh_refcnt) : -1,
 			r->dst.hh ? (r->dst.hh->hh_output ==
 				       dev_queue_xmit) : 0,
@@ -724,7 +724,7 @@ static inline int compare_keys(struct rtable *rt1, struct rtable *rt2)
 	return (((__force u32)rt1->rt_key_dst ^ (__force u32)rt2->rt_key_dst) |
 		((__force u32)rt1->rt_key_src ^ (__force u32)rt2->rt_key_src) |
 		(rt1->rt_mark ^ rt2->rt_mark) |
-		(rt1->rt_tos ^ rt2->rt_tos) |
+		(rt1->rt_key_tos ^ rt2->rt_key_tos) |
 		(rt1->rt_oif ^ rt2->rt_oif) |
 		(rt1->rt_iif ^ rt2->rt_iif)) == 0;
 }
@@ -1349,7 +1349,7 @@ static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst)
 						rt_genid(dev_net(dst->dev)));
 #if RT_CACHE_DEBUG >= 1
 			printk(KERN_DEBUG "ipv4_negative_advice: redirect to %pI4/%02x dropped\n",
-				&rt->rt_dst, rt->rt_tos);
+				&rt->rt_dst, rt->rt_key_tos);
 #endif
 			rt_del(hash, rt);
 			ret = NULL;
@@ -1710,7 +1710,7 @@ void ip_rt_get_source(u8 *addr, struct rtable *rt)
 		struct flowi4 fl4 = {
 			.daddr = rt->rt_key_dst,
 			.saddr = rt->rt_key_src,
-			.flowi4_tos = rt->rt_tos,
+			.flowi4_tos = rt->rt_key_tos,
 			.flowi4_oif = rt->rt_oif,
 			.flowi4_iif = rt->rt_iif,
 			.flowi4_mark = rt->rt_mark,
@@ -1886,7 +1886,7 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	rth->rt_genid	= rt_genid(dev_net(dev));
 	rth->rt_flags	= RTCF_MULTICAST;
 	rth->rt_type	= RTN_MULTICAST;
-	rth->rt_tos	= tos;
+	rth->rt_key_tos	= tos;
 	rth->rt_dst	= daddr;
 	rth->rt_src	= saddr;
 	rth->rt_route_iif = dev->ifindex;
@@ -2023,7 +2023,7 @@ static int __mkroute_input(struct sk_buff *skb,
 	rth->rt_genid = rt_genid(dev_net(rth->dst.dev));
 	rth->rt_flags = flags;
 	rth->rt_type = res->type;
-	rth->rt_tos	= tos;
+	rth->rt_key_tos	= tos;
 	rth->rt_dst	= daddr;
 	rth->rt_src	= saddr;
 	rth->rt_route_iif = in_dev->dev->ifindex;
@@ -2203,7 +2203,7 @@ local_input:
 	rth->rt_genid = rt_genid(net);
 	rth->rt_flags 	= flags|RTCF_LOCAL;
 	rth->rt_type	= res.type;
-	rth->rt_tos	= tos;
+	rth->rt_key_tos	= tos;
 	rth->rt_dst	= daddr;
 	rth->rt_src	= saddr;
 #ifdef CONFIG_IP_ROUTE_CLASSID
@@ -2293,7 +2293,7 @@ int ip_route_input_common(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		     ((__force u32)rth->rt_key_src ^ (__force u32)saddr) |
 		     (rth->rt_iif ^ iif) |
 		     rth->rt_oif |
-		     (rth->rt_tos ^ tos)) == 0 &&
+		     (rth->rt_key_tos ^ tos)) == 0 &&
 		    rth->rt_mark == skb->mark &&
 		    net_eq(dev_net(rth->dst.dev), net) &&
 		    !rt_is_expired(rth)) {
@@ -2410,7 +2410,7 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 	rth->rt_genid = rt_genid(dev_net(dev_out));
 	rth->rt_flags	= flags;
 	rth->rt_type	= type;
-	rth->rt_tos	= tos;
+	rth->rt_key_tos	= tos;
 	rth->rt_dst	= fl4->daddr;
 	rth->rt_src	= fl4->saddr;
 	rth->rt_route_iif = 0;
@@ -2668,7 +2668,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *flp4)
 		    rt_is_output_route(rth) &&
 		    rth->rt_oif == flp4->flowi4_oif &&
 		    rth->rt_mark == flp4->flowi4_mark &&
-		    !((rth->rt_tos ^ flp4->flowi4_tos) &
+		    !((rth->rt_key_tos ^ flp4->flowi4_tos) &
 			    (IPTOS_RT_MASK | RTO_ONLINK)) &&
 		    net_eq(dev_net(rth->dst.dev), net) &&
 		    !rt_is_expired(rth)) {
@@ -2740,7 +2740,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_or
 
 		rt->rt_key_dst = ort->rt_key_dst;
 		rt->rt_key_src = ort->rt_key_src;
-		rt->rt_tos = ort->rt_tos;
+		rt->rt_key_tos = ort->rt_key_tos;
 		rt->rt_route_iif = ort->rt_route_iif;
 		rt->rt_iif = ort->rt_iif;
 		rt->rt_oif = ort->rt_oif;
@@ -2803,7 +2803,7 @@ static int rt_fill_info(struct net *net,
 	r->rtm_family	 = AF_INET;
 	r->rtm_dst_len	= 32;
 	r->rtm_src_len	= 0;
-	r->rtm_tos	= rt->rt_tos;
+	r->rtm_tos	= rt->rt_key_tos;
 	r->rtm_table	= RT_TABLE_MAIN;
 	NLA_PUT_U32(skb, RTA_TABLE, RT_TABLE_MAIN);
 	r->rtm_type	= rt->rt_type;
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 59b1340..7ff973b 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -73,7 +73,7 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 
 	rt->rt_key_dst = fl4->daddr;
 	rt->rt_key_src = fl4->saddr;
-	rt->rt_tos = fl4->flowi4_tos;
+	rt->rt_key_tos = fl4->flowi4_tos;
 	rt->rt_route_iif = fl4->flowi4_iif;
 	rt->rt_iif = fl4->flowi4_iif;
 	rt->rt_oif = fl4->flowi4_oif;
-- 
1.7.4.5


^ permalink raw reply related

* [PATCH 3/8] dccp: Use flowi4->saddr in dccp_v4_connect()
From: David Miller @ 2011-05-04 17:14 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/dccp/ipv4.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index f4254bb..36700a4 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -86,7 +86,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 		daddr = fl4.daddr;
 
 	if (inet->inet_saddr == 0)
-		inet->inet_saddr = rt->rt_src;
+		inet->inet_saddr = fl4.saddr;
 	inet->inet_rcv_saddr = inet->inet_saddr;
 
 	inet->inet_dport = usin->sin_port;
-- 
1.7.4.5


^ permalink raw reply related

* [PATCH 4/8] ipv4: Make caller provide on-stack flow key to ip_route_output_ports().
From: David Miller @ 2011-05-04 17:14 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/infiniband/hw/cxgb3/iwch_cm.c |    3 ++-
 drivers/infiniband/hw/cxgb4/cm.c      |    3 ++-
 drivers/net/pptp.c                    |    6 ++++--
 drivers/scsi/cxgbi/libcxgbi.c         |    3 ++-
 include/net/route.h                   |   11 +++++------
 net/ipv4/af_inet.c                    |    3 ++-
 net/ipv4/igmp.c                       |    6 ++++--
 net/ipv4/ip_output.c                  |    3 ++-
 net/ipv4/ipip.c                       |   19 +++++++++++--------
 net/ipv4/ipmr.c                       |    5 +++--
 net/ipv6/ip6_tunnel.c                 |    5 +++--
 net/ipv6/sit.c                        |    6 ++++--
 net/l2tp/l2tp_ip.c                    |    3 ++-
 net/rxrpc/ar-peer.c                   |    3 ++-
 14 files changed, 48 insertions(+), 31 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index 3216bca..2391841 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -338,8 +338,9 @@ static struct rtable *find_route(struct t3cdev *dev, __be32 local_ip,
 				 __be16 peer_port, u8 tos)
 {
 	struct rtable *rt;
+	struct flowi4 fl4;
 
-	rt = ip_route_output_ports(&init_net, NULL, peer_ip, local_ip,
+	rt = ip_route_output_ports(&init_net, &fl4, NULL, peer_ip, local_ip,
 				   peer_port, local_port, IPPROTO_TCP,
 				   tos, 0);
 	if (IS_ERR(rt))
diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 9d8dcfa..6aa53cd 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -315,8 +315,9 @@ static struct rtable *find_route(struct c4iw_dev *dev, __be32 local_ip,
 				 __be16 peer_port, u8 tos)
 {
 	struct rtable *rt;
+	struct flowi4 fl4;
 
-	rt = ip_route_output_ports(&init_net, NULL, peer_ip, local_ip,
+	rt = ip_route_output_ports(&init_net, &fl4, NULL, peer_ip, local_ip,
 				   peer_port, local_port, IPPROTO_TCP,
 				   tos, 0);
 	if (IS_ERR(rt))
diff --git a/drivers/net/pptp.c b/drivers/net/pptp.c
index 51dfcf8..e771e8d 100644
--- a/drivers/net/pptp.c
+++ b/drivers/net/pptp.c
@@ -175,6 +175,7 @@ static int pptp_xmit(struct ppp_channel *chan, struct sk_buff *skb)
 	struct pptp_opt *opt = &po->proto.pptp;
 	struct pptp_gre_header *hdr;
 	unsigned int header_len = sizeof(*hdr);
+	struct flowi4 fl4;
 	int islcp;
 	int len;
 	unsigned char *data;
@@ -189,7 +190,7 @@ static int pptp_xmit(struct ppp_channel *chan, struct sk_buff *skb)
 	if (sk_pppox(po)->sk_state & PPPOX_DEAD)
 		goto tx_error;
 
-	rt = ip_route_output_ports(&init_net, NULL,
+	rt = ip_route_output_ports(&init_net, &fl4, NULL,
 				   opt->dst_addr.sin_addr.s_addr,
 				   opt->src_addr.sin_addr.s_addr,
 				   0, 0, IPPROTO_GRE,
@@ -434,6 +435,7 @@ static int pptp_connect(struct socket *sock, struct sockaddr *uservaddr,
 	struct pppox_sock *po = pppox_sk(sk);
 	struct pptp_opt *opt = &po->proto.pptp;
 	struct rtable *rt;
+	struct flowi4 fl4;
 	int error = 0;
 
 	if (sp->sa_protocol != PX_PROTO_PPTP)
@@ -463,7 +465,7 @@ static int pptp_connect(struct socket *sock, struct sockaddr *uservaddr,
 	po->chan.private = sk;
 	po->chan.ops = &pptp_chan_ops;
 
-	rt = ip_route_output_ports(&init_net, sk,
+	rt = ip_route_output_ports(&init_net, &fl4, sk,
 				   opt->dst_addr.sin_addr.s_addr,
 				   opt->src_addr.sin_addr.s_addr,
 				   0, 0,
diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index de764ea..0c33d25 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -454,8 +454,9 @@ static struct rtable *find_route_ipv4(__be32 saddr, __be32 daddr,
 				      __be16 sport, __be16 dport, u8 tos)
 {
 	struct rtable *rt;
+	struct flowi4 fl4;
 
-	rt = ip_route_output_ports(&init_net, NULL, daddr, saddr,
+	rt = ip_route_output_ports(&init_net, &fl4, NULL, daddr, saddr,
 				   dport, sport, IPPROTO_TCP, tos, 0);
 	if (IS_ERR(rt))
 		return NULL;
diff --git a/include/net/route.h b/include/net/route.h
index f07609e..8c02c87 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -137,20 +137,19 @@ static inline struct rtable *ip_route_output(struct net *net, __be32 daddr,
 	return ip_route_output_key(net, &fl4);
 }
 
-static inline struct rtable *ip_route_output_ports(struct net *net, struct sock *sk,
+static inline struct rtable *ip_route_output_ports(struct net *net, struct flowi4 *fl4,
+						   struct sock *sk,
 						   __be32 daddr, __be32 saddr,
 						   __be16 dport, __be16 sport,
 						   __u8 proto, __u8 tos, int oif)
 {
-	struct flowi4 fl4;
-
-	flowi4_init_output(&fl4, oif, sk ? sk->sk_mark : 0, tos,
+	flowi4_init_output(fl4, oif, sk ? sk->sk_mark : 0, tos,
 			   RT_SCOPE_UNIVERSE, proto,
 			   sk ? inet_sk_flowi_flags(sk) : 0,
 			   daddr, saddr, dport, sport);
 	if (sk)
-		security_sk_classify_flow(sk, flowi4_to_flowi(&fl4));
-	return ip_route_output_flow(net, &fl4, sk);
+		security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
+	return ip_route_output_flow(net, fl4, sk);
 }
 
 static inline struct rtable *ip_route_output_gre(struct net *net,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 4e73499..7b91fa8 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1152,6 +1152,7 @@ int inet_sk_rebuild_header(struct sock *sk)
 	struct rtable *rt = (struct rtable *)__sk_dst_check(sk, 0);
 	__be32 daddr;
 	struct ip_options_rcu *inet_opt;
+	struct flowi4 fl4;
 	int err;
 
 	/* Route is OK, nothing to do. */
@@ -1165,7 +1166,7 @@ int inet_sk_rebuild_header(struct sock *sk)
 	if (inet_opt && inet_opt->opt.srr)
 		daddr = inet_opt->opt.faddr;
 	rcu_read_unlock();
-	rt = ip_route_output_ports(sock_net(sk), sk, daddr, inet->inet_saddr,
+	rt = ip_route_output_ports(sock_net(sk), &fl4, sk, daddr, inet->inet_saddr,
 				   inet->inet_dport, inet->inet_sport,
 				   sk->sk_protocol, RT_CONN_FLAGS(sk),
 				   sk->sk_bound_dev_if);
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 8ae0a57..7c2ef59 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -309,6 +309,7 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, int size)
 	struct iphdr *pip;
 	struct igmpv3_report *pig;
 	struct net *net = dev_net(dev);
+	struct flowi4 fl4;
 
 	while (1) {
 		skb = alloc_skb(size + LL_ALLOCATED_SPACE(dev),
@@ -321,7 +322,7 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, int size)
 	}
 	igmp_skb_size(skb) = size;
 
-	rt = ip_route_output_ports(net, NULL, IGMPV3_ALL_MCR, 0,
+	rt = ip_route_output_ports(net, &fl4, NULL, IGMPV3_ALL_MCR, 0,
 				   0, 0,
 				   IPPROTO_IGMP, 0, dev->ifindex);
 	if (IS_ERR(rt)) {
@@ -650,6 +651,7 @@ static int igmp_send_report(struct in_device *in_dev, struct ip_mc_list *pmc,
 	struct net_device *dev = in_dev->dev;
 	struct net *net = dev_net(dev);
 	__be32	group = pmc ? pmc->multiaddr : 0;
+	struct flowi4 fl4;
 	__be32	dst;
 
 	if (type == IGMPV3_HOST_MEMBERSHIP_REPORT)
@@ -659,7 +661,7 @@ static int igmp_send_report(struct in_device *in_dev, struct ip_mc_list *pmc,
 	else
 		dst = group;
 
-	rt = ip_route_output_ports(net, NULL, dst, 0,
+	rt = ip_route_output_ports(net, &fl4, NULL, dst, 0,
 				   0, 0,
 				   IPPROTO_IGMP, 0, dev->ifindex);
 	if (IS_ERR(rt))
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 362e66f..3aa4c31 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -333,6 +333,7 @@ int ip_queue_xmit(struct sk_buff *skb)
 	/* Make sure we can route this packet. */
 	rt = (struct rtable *)__sk_dst_check(sk, 0);
 	if (rt == NULL) {
+		struct flowi4 fl4;
 		__be32 daddr;
 
 		/* Use correct destination address if we have options. */
@@ -344,7 +345,7 @@ int ip_queue_xmit(struct sk_buff *skb)
 		 * keep trying until route appears or the connection times
 		 * itself out.
 		 */
-		rt = ip_route_output_ports(sock_net(sk), sk,
+		rt = ip_route_output_ports(sock_net(sk), &fl4, sk,
 					   daddr, inet->inet_saddr,
 					   inet->inet_dport,
 					   inet->inet_sport,
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index ef16377..88d96bd 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -442,6 +442,7 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct iphdr  *iph;			/* Our new IP header */
 	unsigned int max_headroom;		/* The extra header space needed */
 	__be32 dst = tiph->daddr;
+	struct flowi4 fl4;
 	int    mtu;
 
 	if (skb->protocol != htons(ETH_P_IP))
@@ -460,7 +461,7 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 			goto tx_error_icmp;
 	}
 
-	rt = ip_route_output_ports(dev_net(dev), NULL,
+	rt = ip_route_output_ports(dev_net(dev), &fl4, NULL,
 				   dst, tiph->saddr,
 				   0, 0,
 				   IPPROTO_IPIP, RT_TOS(tos),
@@ -578,13 +579,15 @@ static void ipip_tunnel_bind_dev(struct net_device *dev)
 	iph = &tunnel->parms.iph;
 
 	if (iph->daddr) {
-		struct rtable *rt = ip_route_output_ports(dev_net(dev), NULL,
-							  iph->daddr, iph->saddr,
-							  0, 0,
-							  IPPROTO_IPIP,
-							  RT_TOS(iph->tos),
-							  tunnel->parms.link);
-
+		struct rtable *rt;
+		struct flowi4 fl4;
+
+		rt = ip_route_output_ports(dev_net(dev), &fl4, NULL,
+					   iph->daddr, iph->saddr,
+					   0, 0,
+					   IPPROTO_IPIP,
+					   RT_TOS(iph->tos),
+					   tunnel->parms.link);
 		if (!IS_ERR(rt)) {
 			tdev = rt->dst.dev;
 			ip_rt_put(rt);
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 3ad38a4..86033b7 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1595,6 +1595,7 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 	struct vif_device *vif = &mrt->vif_table[vifi];
 	struct net_device *dev;
 	struct rtable *rt;
+	struct flowi4 fl4;
 	int    encap = 0;
 
 	if (vif->dev == NULL)
@@ -1612,7 +1613,7 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 #endif
 
 	if (vif->flags & VIFF_TUNNEL) {
-		rt = ip_route_output_ports(net, NULL,
+		rt = ip_route_output_ports(net, &fl4, NULL,
 					   vif->remote, vif->local,
 					   0, 0,
 					   IPPROTO_IPIP,
@@ -1621,7 +1622,7 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 			goto out_free;
 		encap = sizeof(struct iphdr);
 	} else {
-		rt = ip_route_output_ports(net, NULL, iph->daddr, 0,
+		rt = ip_route_output_ports(net, &fl4, NULL, iph->daddr, 0,
 					   0, 0,
 					   IPPROTO_IPIP,
 					   RT_TOS(iph->tos), vif->link);
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 9dd0e96..3dff27c 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -537,6 +537,7 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	struct sk_buff *skb2;
 	const struct iphdr *eiph;
 	struct rtable *rt;
+	struct flowi4 fl4;
 
 	err = ip6_tnl_err(skb, IPPROTO_IPIP, opt, &rel_type, &rel_code,
 			  &rel_msg, &rel_info, offset);
@@ -577,7 +578,7 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	eiph = ip_hdr(skb2);
 
 	/* Try to guess incoming interface */
-	rt = ip_route_output_ports(dev_net(skb->dev), NULL,
+	rt = ip_route_output_ports(dev_net(skb->dev), &fl4, NULL,
 				   eiph->saddr, 0,
 				   0, 0,
 				   IPPROTO_IPIP, RT_TOS(eiph->tos), 0);
@@ -590,7 +591,7 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	if (rt->rt_flags & RTCF_LOCAL) {
 		ip_rt_put(rt);
 		rt = NULL;
-		rt = ip_route_output_ports(dev_net(skb->dev), NULL,
+		rt = ip_route_output_ports(dev_net(skb->dev), &fl4, NULL,
 					   eiph->daddr, eiph->saddr,
 					   0, 0,
 					   IPPROTO_IPIP,
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 34d8964..a24fb14 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -674,6 +674,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 	struct iphdr  *iph;			/* Our new IP header */
 	unsigned int max_headroom;		/* The extra header space needed */
 	__be32 dst = tiph->daddr;
+	struct flowi4 fl4;
 	int    mtu;
 	const struct in6_addr *addr6;
 	int addr_type;
@@ -733,7 +734,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 		dst = addr6->s6_addr32[3];
 	}
 
-	rt = ip_route_output_ports(dev_net(dev), NULL,
+	rt = ip_route_output_ports(dev_net(dev), &fl4, NULL,
 				   dst, tiph->saddr,
 				   0, 0,
 				   IPPROTO_IPV6, RT_TOS(tos),
@@ -851,12 +852,13 @@ static void ipip6_tunnel_bind_dev(struct net_device *dev)
 	struct net_device *tdev = NULL;
 	struct ip_tunnel *tunnel;
 	const struct iphdr *iph;
+	struct flowi4 fl4;
 
 	tunnel = netdev_priv(dev);
 	iph = &tunnel->parms.iph;
 
 	if (iph->daddr) {
-		struct rtable *rt = ip_route_output_ports(dev_net(dev), NULL,
+		struct rtable *rt = ip_route_output_ports(dev_net(dev), &fl4, NULL,
 							  iph->daddr, iph->saddr,
 							  0, 0,
 							  IPPROTO_IPV6,
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index a4d2dfa..8189960 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -471,6 +471,7 @@ static int l2tp_ip_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *m
 
 	if (rt == NULL) {
 		struct ip_options_rcu *inet_opt;
+		struct flowi4 fl4;
 
 		rcu_read_lock();
 		inet_opt = rcu_dereference(inet->inet_opt);
@@ -485,7 +486,7 @@ static int l2tp_ip_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *m
 		 * keep trying until route appears or the connection times
 		 * itself out.
 		 */
-		rt = ip_route_output_ports(sock_net(sk), sk,
+		rt = ip_route_output_ports(sock_net(sk), &fl4, sk,
 					   daddr, inet->inet_saddr,
 					   inet->inet_dport, inet->inet_sport,
 					   sk->sk_protocol, RT_CONN_FLAGS(sk),
diff --git a/net/rxrpc/ar-peer.c b/net/rxrpc/ar-peer.c
index 55b93dc..b6ff063 100644
--- a/net/rxrpc/ar-peer.c
+++ b/net/rxrpc/ar-peer.c
@@ -36,10 +36,11 @@ static void rxrpc_destroy_peer(struct work_struct *work);
 static void rxrpc_assess_MTU_size(struct rxrpc_peer *peer)
 {
 	struct rtable *rt;
+	struct flowi4 fl4;
 
 	peer->if_mtu = 1500;
 
-	rt = ip_route_output_ports(&init_net, NULL,
+	rt = ip_route_output_ports(&init_net, &fl4, NULL,
 				   peer->srx.transport.sin.sin_addr.s_addr, 0,
 				   htons(7000), htons(7001),
 				   IPPROTO_UDP, 0, 0);
-- 
1.7.4.5


^ permalink raw reply related

* [PATCH 5/8] pptp: Use flowi4's daddr/saddr in pptp_xmit().
From: David Miller @ 2011-05-04 17:14 UTC (permalink / raw)
  To: netdev


Instead of rt->rt_{src,dst}

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/pptp.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/pptp.c b/drivers/net/pptp.c
index e771e8d..1286fe2 100644
--- a/drivers/net/pptp.c
+++ b/drivers/net/pptp.c
@@ -271,8 +271,8 @@ static int pptp_xmit(struct ppp_channel *chan, struct sk_buff *skb)
 		iph->frag_off	=	0;
 	iph->protocol = IPPROTO_GRE;
 	iph->tos      = 0;
-	iph->daddr    = rt->rt_dst;
-	iph->saddr    = rt->rt_src;
+	iph->daddr    = fl4.daddr;
+	iph->saddr    = fl4.saddr;
 	iph->ttl      = ip4_dst_hoplimit(&rt->dst);
 	iph->tot_len  = htons(skb->len);
 
-- 
1.7.4.5


^ permalink raw reply related

* [PATCH 6/8] libcxgbi: Use flowi4's saddr in cxgbi_check_route().
From: David Miller @ 2011-05-04 17:14 UTC (permalink / raw)
  To: netdev


Instead of rt->rt_src

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/scsi/cxgbi/libcxgbi.c |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index 0c33d25..a2a9c7c 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -450,13 +450,13 @@ static struct cxgbi_sock *cxgbi_sock_create(struct cxgbi_device *cdev)
 	return csk;
 }
 
-static struct rtable *find_route_ipv4(__be32 saddr, __be32 daddr,
+static struct rtable *find_route_ipv4(struct flowi4 *fl4,
+				      __be32 saddr, __be32 daddr,
 				      __be16 sport, __be16 dport, u8 tos)
 {
 	struct rtable *rt;
-	struct flowi4 fl4;
 
-	rt = ip_route_output_ports(&init_net, &fl4, NULL, daddr, saddr,
+	rt = ip_route_output_ports(&init_net, fl4, NULL, daddr, saddr,
 				   dport, sport, IPPROTO_TCP, tos, 0);
 	if (IS_ERR(rt))
 		return NULL;
@@ -471,6 +471,7 @@ static struct cxgbi_sock *cxgbi_check_route(struct sockaddr *dst_addr)
 	struct net_device *ndev;
 	struct cxgbi_device *cdev;
 	struct rtable *rt = NULL;
+	struct flowi4 fl4;
 	struct cxgbi_sock *csk = NULL;
 	unsigned int mtu = 0;
 	int port = 0xFFFF;
@@ -483,7 +484,7 @@ static struct cxgbi_sock *cxgbi_check_route(struct sockaddr *dst_addr)
 		goto err_out;
 	}
 
-	rt = find_route_ipv4(0, daddr->sin_addr.s_addr, 0, daddr->sin_port, 0);
+	rt = find_route_ipv4(&fl4, 0, daddr->sin_addr.s_addr, 0, daddr->sin_port, 0);
 	if (!rt) {
 		pr_info("no route to ipv4 0x%x, port %u.\n",
 			daddr->sin_addr.s_addr, daddr->sin_port);
@@ -532,7 +533,7 @@ static struct cxgbi_sock *cxgbi_check_route(struct sockaddr *dst_addr)
 	csk->daddr.sin_addr.s_addr = daddr->sin_addr.s_addr;
 	csk->daddr.sin_port = daddr->sin_port;
 	csk->daddr.sin_family = daddr->sin_family;
-	csk->saddr.sin_addr.s_addr = rt->rt_src;
+	csk->saddr.sin_addr.s_addr = fl4.saddr;
 
 	return csk;
 
-- 
1.7.4.5


^ permalink raw reply related

* [PATCH 7/8] ipv4: Use flowi4's {saddr,daddr} in igmpv3_newpack() and igmp_send_report()
From: David Miller @ 2011-05-04 17:15 UTC (permalink / raw)
  To: netdev


Instead of rt->rt_{src,dst}

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/igmp.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 7c2ef59..ec03c2f 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -344,8 +344,8 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, int size)
 	pip->tos      = 0xc0;
 	pip->frag_off = htons(IP_DF);
 	pip->ttl      = 1;
-	pip->daddr    = rt->rt_dst;
-	pip->saddr    = rt->rt_src;
+	pip->daddr    = fl4.daddr;
+	pip->saddr    = fl4.saddr;
 	pip->protocol = IPPROTO_IGMP;
 	pip->tot_len  = 0;	/* filled in later */
 	ip_select_ident(pip, &rt->dst, NULL);
@@ -687,7 +687,7 @@ static int igmp_send_report(struct in_device *in_dev, struct ip_mc_list *pmc,
 	iph->frag_off = htons(IP_DF);
 	iph->ttl      = 1;
 	iph->daddr    = dst;
-	iph->saddr    = rt->rt_src;
+	iph->saddr    = fl4.saddr;
 	iph->protocol = IPPROTO_IGMP;
 	ip_select_ident(iph, &rt->dst, NULL);
 	((u8*)&iph[1])[0] = IPOPT_RA;
-- 
1.7.4.5


^ permalink raw reply related

* [PATCH 8/8] sctp: Use flowi4's {saddr,daddr} in sctp_v4_dst_saddr() and sctp_v4_get_dst()
From: David Miller @ 2011-05-04 17:15 UTC (permalink / raw)
  To: netdev


Instead of rt->rt_{src,dst}

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/sctp/protocol.c |    9 ++++-----
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 9d3f159..69fbc55 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -339,13 +339,12 @@ static int sctp_v4_to_addr_param(const union sctp_addr *addr,
 }
 
 /* Initialize a sctp_addr from a dst_entry. */
-static void sctp_v4_dst_saddr(union sctp_addr *saddr, struct dst_entry *dst,
+static void sctp_v4_dst_saddr(union sctp_addr *saddr, struct flowi4 *fl4,
 			      __be16 port)
 {
-	struct rtable *rt = (struct rtable *)dst;
 	saddr->v4.sin_family = AF_INET;
 	saddr->v4.sin_port = port;
-	saddr->v4.sin_addr.s_addr = rt->rt_src;
+	saddr->v4.sin_addr.s_addr = fl4->saddr;
 }
 
 /* Compare two addresses exactly. */
@@ -508,7 +507,7 @@ static void sctp_v4_get_dst(struct sctp_transport *t, union sctp_addr *saddr,
 		/* Walk through the bind address list and look for a bind
 		 * address that matches the source address of the returned dst.
 		 */
-		sctp_v4_dst_saddr(&dst_saddr, dst, htons(bp->port));
+		sctp_v4_dst_saddr(&dst_saddr, fl4, htons(bp->port));
 		rcu_read_lock();
 		list_for_each_entry_rcu(laddr, &bp->address_list, list) {
 			if (!laddr->valid || (laddr->state != SCTP_ADDR_SRC))
@@ -550,7 +549,7 @@ out:
 	t->dst = dst;
 	if (dst)
 		SCTP_DEBUG_PRINTK("rt_dst:%pI4, rt_src:%pI4\n",
-				  &rt->rt_dst, &rt->rt_src);
+				  &fl4->daddr, &fl4->saddr);
 	else
 		SCTP_DEBUG_PRINTK("NO ROUTE\n");
 }
-- 
1.7.4.5


^ permalink raw reply related

* Re: [ethtool PATCH 4/4] v5 Add RX packet classification interface
From: Ben Hutchings @ 2011-05-04 17:24 UTC (permalink / raw)
  To: Dimitris Michailidis; +Cc: Alexander Duyck, davem, jeffrey.t.kirsher, netdev
In-Reply-To: <4DC1883F.7050301@chelsio.com>

On Wed, 2011-05-04 at 10:09 -0700, Dimitris Michailidis wrote:
> On 05/03/2011 04:34 PM, Ben Hutchings wrote:
> > On Tue, 2011-05-03 at 16:23 -0700, Dimitris Michailidis wrote:
> >> I think RX_CLS_LOC_UNSPEC should be passed to the driver, where there is 
> >> enough knowledge to pick an appropriate slot.  So I'd remove the
> >>
> >>      if (loc == RX_CLS_LOC_UNSPEC)
> >>
> >> block above, let the driver pick a slot, and then pass the selected location 
> >> back for ethtool to report.
> > 
> > But first we have to specify this in the ethtool API.  So please propose
> > a patch to ethtool.h.
> 
> In the past we discussed that being able to specify the first available slot or 
> the last available would be useful, so something like the below?
>
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index 4194a20..909ef79 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -442,7 +442,8 @@ struct ethtool_flow_ext {
>    *	includes the %FLOW_EXT flag.
>    * @ring_cookie: RX ring/queue index to deliver to, or %RX_CLS_FLOW_DISC
>    *	if packets should be discarded
> - * @location: Index of filter in hardware table
> + * @location: Index of filter in hardware table, or %RX_CLS_FLOW_FIRST_LOC for
> + *	first available index, or %RX_CLS_FLOW_LAST_LOC for last available
[...]

I think that's reasonable.  We should also explicitly state that
location determines priority, i.e. if a packet matches two filters then
the one with the lower location wins.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: ath5k regression associating with APs in 2.6.38
From: John W. Linville @ 2011-05-04 17:27 UTC (permalink / raw)
  To: Jiri Slaby, Nick Kossifidis, Luis R. Rodriguez, Bob Copeland,
	linux-wireless
In-Reply-To: <20110504153819.GA4551@thinkpad-t410>

On Wed, May 04, 2011 at 10:38:19AM -0500, Seth Forshee wrote:
> I've been investigating some reports of a regression in associating with
> APs with AR2413 in 2.6.38. Association repeatedly fails with some
> "direct probe to x timed out" messages (see syslog excerpt below),
> although it will generally associate eventually, after many tries.
> 
> Bisection identifies 8aec7af (ath5k: Support synth-only channel change
> for AR2413/AR5413) as offending commit. Prior to this commit there are
> no direct probe messages at all in the logs. I've also found that
> forcing fast to false at the top of ath5k_hw_reset() fixes the issue.
> I'm not sure what the connection is between this commit and the
> timeouts. Any suggestions?

Have you tried reverting that commit on top of 2.6.38?  Can you
recreate the issue with 2.6.39-rc6 (or later)?

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [ethtool PATCH 4/4] v5 Add RX packet classification interface
From: Dimitris Michailidis @ 2011-05-04 17:33 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Alexander Duyck, davem, jeffrey.t.kirsher, netdev
In-Reply-To: <1304529892.2926.14.camel@bwh-desktop>

On 05/04/2011 10:24 AM, Ben Hutchings wrote:
> On Wed, 2011-05-04 at 10:09 -0700, Dimitris Michailidis wrote:
>> On 05/03/2011 04:34 PM, Ben Hutchings wrote:
>>> On Tue, 2011-05-03 at 16:23 -0700, Dimitris Michailidis wrote:
>>>> I think RX_CLS_LOC_UNSPEC should be passed to the driver, where there is 
>>>> enough knowledge to pick an appropriate slot.  So I'd remove the
>>>>
>>>>      if (loc == RX_CLS_LOC_UNSPEC)
>>>>
>>>> block above, let the driver pick a slot, and then pass the selected location 
>>>> back for ethtool to report.
>>> But first we have to specify this in the ethtool API.  So please propose
>>> a patch to ethtool.h.
>> In the past we discussed that being able to specify the first available slot or 
>> the last available would be useful, so something like the below?
>>
>> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
>> index 4194a20..909ef79 100644
>> --- a/include/linux/ethtool.h
>> +++ b/include/linux/ethtool.h
>> @@ -442,7 +442,8 @@ struct ethtool_flow_ext {
>>    *	includes the %FLOW_EXT flag.
>>    * @ring_cookie: RX ring/queue index to deliver to, or %RX_CLS_FLOW_DISC
>>    *	if packets should be discarded
>> - * @location: Index of filter in hardware table
>> + * @location: Index of filter in hardware table, or %RX_CLS_FLOW_FIRST_LOC for
>> + *	first available index, or %RX_CLS_FLOW_LAST_LOC for last available
> [...]
> 
> I think that's reasonable.  We should also explicitly state that
> location determines priority, i.e. if a packet matches two filters then
> the one with the lower location wins.

Easy and true for a TCAM.  For hashing would you use the location to decide how 
to order filters that fall in the same bucket?


^ permalink raw reply

* Re: [PATCH] net: add mac_pton() for parsing MAC address
From: Alexey Dobriyan @ 2011-05-04 17:39 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev
In-Reply-To: <20110504081225.267a0833@nehalam>

On Wed, May 04, 2011 at 08:12:25AM -0700, Stephen Hemminger wrote:
> On Wed, 4 May 2011 09:15:51 +0300
> Alexey Dobriyan <adobriyan@gmail.com> wrote:
> 
> > +int mac_pton(const char *s, u8 *mac)
> > +{
> > +	int i;
> > +
> > +	/* XX:XX:XX:XX:XX:XX */
> > +	if (strlen(s) < 3 * ETH_ALEN - 1)
> > +		return 0;
> > +
> > +	/* Don't half dirty result. */
> Shouldn't this be "Don't allow dirty result."?

Maybe.
It means "only dirty result, if everything is OK." like inet_pton(3).

> > +	for (i = 0; i < ETH_ALEN; i++) {
> > +		if (!strchr("0123456789abcdefABCDEF", s[i * 3]))
> > +			return 0;
> > +		if (!strchr("0123456789abcdefABCDEF", s[i * 3 + 1]))
> > +			return 0;
> 
> 		if (!isxdigit(s[i*3]) || !isxdigit(s[i*3+1]))
> 			return 0;
> 
> > +		if (i != ETH_ALEN - 1 && s[i * 3 + 2] != ':')
> > +			return 0;
> > +	}
> > +	for (i = 0; i < ETH_ALEN; i++) {
> > +		mac[i] = (hex_to_bin(s[i * 3]) << 4) | hex_to_bin(s[i * 3 + 1]);
> 		hex2bin(&mac[i], &s[i*3], 1);
> > +	}
> > +	return 1;
> > +}
> > +EXPORT_SYMBOL(mac_pton);
> 
> Also don't need two loops, okay to parse partial result.

You need two loops if code is written as sent.
Otherwise, caller need temporary buffer to not corrupt possibly important
previous MAC value.

^ permalink raw reply

* Re: [ethtool PATCH] FW dump support
From: Ben Hutchings @ 2011-05-04 17:40 UTC (permalink / raw)
  To: anirban.chakraborty; +Cc: netdev, --no-chain-reply-to, davem
In-Reply-To: <1304378957-24123-1-git-send-email-anirban.chakraborty@qlogic.com>

On Mon, 2011-05-02 at 16:29 -0700, anirban.chakraborty@qlogic.com wrote:
> From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
> 
> Added support to take FW dump via ethtool.
[...]
> --- a/ethtool.c
> +++ b/ethtool.c
[...]
> @@ -263,6 +270,12 @@ static struct option {
>  		"Get Rx ntuple filters and actions\n" },
>      { "-P", "--show-permaddr", MODE_PERMADDR,
>  		"Show permanent hardware address" },
> +    { "-W", "--get-dump", MODE_GET_DUMP,
> +		"Get dump level\n" },
> +    { "-Wd", "--get-dump-data", MODE_GET_DUMP_DATA,
> +		"Get dump data", "FILENAME " "Name of the dump file\n" },

The short options should only include one letter.  Also the general
pattern is that 'get' options use lower-case letters and 'set' options
use upper-case letters.  No, I'm not sure how best to handle a set of 3
options.  Maybe you can combine --get-dump and --get-dump-data, making
the filename optional?

> +    { "-w", "--set-dump", MODE_SET_DUMP,
> +		"Set dump level", "DUMPLEVEL " "Dump level for the device\n" },

The field this sets is described as 'flags' so does it consist of flags
or is it a level?

[...]
> @@ -3241,6 +3270,86 @@ static int do_grxntuple(int fd, struct ifreq *ifr)
>  	return 0;
>  }
>  
> +static void do_writedump(struct ethtool_dump *dump)
> +{
> +	FILE *f = fopen(dump_file, "wb+");
> +	size_t bytes;
> +
> +	if (!f ) {
> +		fprintf(stderr, "Can't open file %s: %s\n",
> +			dump_file, strerror(errno));
> +		return;

On error, we must exit with code 1.

> +	}
> +
> +	bytes = fwrite(dump->data, 1, dump->len, f);
> +	fclose(f);
[...]

These functions can also fail and need to be checked.  (Yes, fclose()
can fail, since it may have to flush buffered data.)

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [ethtool PATCH 4/4] v5 Add RX packet classification interface
From: Alexander Duyck @ 2011-05-04 17:41 UTC (permalink / raw)
  To: Dimitris Michailidis
  Cc: Ben Hutchings, davem@davemloft.net, Kirsher, Jeffrey T,
	netdev@vger.kernel.org
In-Reply-To: <4DC18DF8.3090707@chelsio.com>

On 5/4/2011 10:33 AM, Dimitris Michailidis wrote:
> On 05/04/2011 10:24 AM, Ben Hutchings wrote:
>> On Wed, 2011-05-04 at 10:09 -0700, Dimitris Michailidis wrote:
>>> On 05/03/2011 04:34 PM, Ben Hutchings wrote:
>>>> On Tue, 2011-05-03 at 16:23 -0700, Dimitris Michailidis wrote:
>>>>> I think RX_CLS_LOC_UNSPEC should be passed to the driver, where there is
>>>>> enough knowledge to pick an appropriate slot.  So I'd remove the
>>>>>
>>>>>       if (loc == RX_CLS_LOC_UNSPEC)
>>>>>
>>>>> block above, let the driver pick a slot, and then pass the selected location
>>>>> back for ethtool to report.
>>>> But first we have to specify this in the ethtool API.  So please propose
>>>> a patch to ethtool.h.
>>> In the past we discussed that being able to specify the first available slot or
>>> the last available would be useful, so something like the below?
>>>
>>> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
>>> index 4194a20..909ef79 100644
>>> --- a/include/linux/ethtool.h
>>> +++ b/include/linux/ethtool.h
>>> @@ -442,7 +442,8 @@ struct ethtool_flow_ext {
>>>     *	includes the %FLOW_EXT flag.
>>>     * @ring_cookie: RX ring/queue index to deliver to, or %RX_CLS_FLOW_DISC
>>>     *	if packets should be discarded
>>> - * @location: Index of filter in hardware table
>>> + * @location: Index of filter in hardware table, or %RX_CLS_FLOW_FIRST_LOC for
>>> + *	first available index, or %RX_CLS_FLOW_LAST_LOC for last available
>> [...]
>>
>> I think that's reasonable.  We should also explicitly state that
>> location determines priority, i.e. if a packet matches two filters then
>> the one with the lower location wins.
>
> Easy and true for a TCAM.  For hashing would you use the location to decide how
> to order filters that fall in the same bucket?
>

The problem is none of this is backwards compatible.  The niu driver has 
supported the network flow classifier rules since 2.6.30.  Adding this 
would cause all rule setups for niu to fail because these locations 
would have to exist outside of the current rule locations.

This is why I was suggesting that the best approach would be to update 
the kernel to add a separate ioctl for letting the driver setup the 
location.  We can just attempt to make that call and when we get the 
EOPNOTSUPP errno we know the device driver doesn't support it and can 
then let the rule manager take over.

Thanks,

Alex

^ permalink raw reply

* Re: Support e1000 M88 PHY registers in -d
From: Ben Hutchings @ 2011-05-04 17:41 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: Anthony DeRobertis, netdev, e1000-devel, 574574
In-Reply-To: <1301910102.2935.52.camel@localhost>

On Mon, 2011-04-04 at 10:41 +0100, Ben Hutchings wrote:
> On Mon, 2011-04-04 at 01:36 -0700, Jeff Kirsher wrote:
> > On Sat, Apr 2, 2011 at 09:24, Ben Hutchings <bhutchings@solarflare.com> wrote:
> > > Anthony,
> > >
> > > I'm now upstream maintainer for ethtool so I've picked up your patch
> > > again.
> > >
> > > On Fri, 2010-03-19 at 00:32 -0400, Anthony DeRobertis wrote:
> > >> Package: ethtool
> > >> Version: 1:2.6.33-1
> > >> Severity: wishlist
> > >>
> > >> The M88 PHY registers contain useful information like the cable length
> > >> estimate and the MDI/MDIX status. The attached patch makes -d dump
> > >> them.
> > >
> > > Patches for ethtool should include a commit message and Signed-off-by
> > > line, as in the Linux kernel.  See sections 2 and 12 of
> > > <http://www.kernel.org/doc/Documentation/SubmittingPatches>.  They
> > > should be sent to this address and to netdev.
> > >
> > > I'm forwarding this patch to netdev and the e1000 developers for review.
> > >
> > > Ben.
> > 
> > Thanks Ben.  Just to be clear, have you applied these e1000 changes to
> > the ethtool?
> [...]
> 
> I have not applied these changes either anywhere.

...but I will if I don't hear back from you soon.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: Divide error in bictcp_cong_avoid ?
From: Randy Dunlap @ 2011-05-04 17:49 UTC (permalink / raw)
  To: TB, netdev; +Cc: linux-kernel
In-Reply-To: <4DC178D3.6030308@techboom.com>

[add cc to netdev]


On Wed, 04 May 2011 12:03:31 -0400 TB wrote:

> We're having this issue sporadically on a few servers and this is the 
> backtrace we get from netconsole.
> 
> 
> [28522.642419] divide error: 0000 [#1] SMP
> [28522.642457] last sysfs file: 
> /sys/devices/pci0000:00/0000:00:1f.2/host2/target2:0:0/2:0:0:0/vendor
> [28522.642504] CPU 0
> [28522.642511] Modules linked in:
>   i2c_i801
>   i2c_core
>   evdev
>   button
> [28522.642570]
> [28522.642590] Pid: 0, comm: swapper Not tainted 2.6.38.5 #6
> 
>   Supermicro X8DTH-i/6/iF/6F
> /
>   X8DTH
> 
> [28522.642651] RIP: 0010:[<ffffffff8150b27b>] [<ffffffff8150b27b>] 
> bictcp_cong_avoid+0x21a/0x247
> [28522.642708] RSP: 0018:ffff8800bf403a90  EFLAGS: 00010202
> [28522.642735] RAX: 0000000000000010 RBX: ffff880352aa6400 RCX: 
> 0000000000000000
> [28522.642765] RDX: 0000000000000000 RSI: ffff880352aa67c0 RDI: 
> 0000000000001607
> [28522.642795] RBP: 000000007caa5a1b R08: 00000000000035c2 R09: 
> 00000000000000e6
> [28522.642825] R10: ffff88003d499c00 R11: ffff880109831b00 R12: 
> ffffffff817cecd0
> [28522.642855] R13: 0000000000000004 R14: 000000000001001b R15: 
> 0000000000000123
> [28522.642886] FS:  0000000000000000(0000) GS:ffff8800bf400000(0000) 
> knlGS:0000000000000000
> [28522.642932] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [28522.642959] CR2: 00007fb4c6ffd000 CR3: 000000042e4a1000 CR4: 
> 00000000000006f0
> [28522.642990] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [28522.643020] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
> [28522.643050] Process swapper (pid: 0, threadinfo ffffffff8176c000, 
> task ffffffff81777020)
> [28522.643095] Stack:
> [28522.643116]  ffff880352aa6400
>   ffffffff817cecd0
>   0000000000000004
>   0000000000000406
> 
> [28522.643171]  ffff880352aa6400
>   ffffffff814e3dc5
>   000000000000111c
>   ffff88022a9b3200
> 
> [28522.643226]  0000000000000000
>   7caa5a1b7caa4ec3
>   0000000000000000
>   0000000000000000
> 
> [28522.643281] Call Trace:
> [28522.643303]  <IRQ>
> 
> [28522.643330]  [<ffffffff814e3dc5>] ? tcp_ack+0x18b5/0x1a89
> [28522.643359]  [<ffffffff814e45c2>] ? tcp_rcv_established+0xd1/0xa13
> [28522.643389]  [<ffffffff814ec60b>] ? tcp_v4_do_rcv+0x1b2/0x382
> [28522.643418]  [<ffffffff814c95d4>] ? nf_iterate+0x40/0x78
> [28522.643446]  [<ffffffff814ecc5f>] ? tcp_v4_rcv+0x484/0x797
> [28522.643475]  [<ffffffff814d11c7>] ? ip_local_deliver_finish+0xab/0x139
> [28522.643505]  [<ffffffff814ae2b3>] ? __netif_receive_skb+0x31c/0x349
> [28522.643535]  [<ffffffff814aec82>] ? netif_receive_skb+0x67/0x6d
> [28522.643564]  [<ffffffff814af1fb>] ? napi_gro_receive+0x9d/0xab
> [28522.643592]  [<ffffffff814aed57>] ? napi_skb_finish+0x1c/0x31
> [28522.643623]  [<ffffffff813e4248>] ? igb_poll+0x7d5/0xb2e
> [28522.643653]  [<ffffffff812b6b22>] ? blk_run_queue+0x23/0x37
> [28522.643683]  [<ffffffff813520d4>] ? scsi_run_queue+0x2ee/0x381
> [28522.643712]  [<ffffffff81353810>] ? scsi_io_completion+0x3e0/0x409
> [28522.643741]  [<ffffffff814af337>] ? net_rx_action+0xa7/0x212
> [28522.643771]  [<ffffffff8103b6c2>] ? __do_softirq+0xbe/0x184
> [28522.643800]  [<ffffffff8100364c>] ? call_softirq+0x1c/0x28
> [28522.643828]  [<ffffffff81005085>] ? do_softirq+0x31/0x63
> [28522.643856]  [<ffffffff8103b56c>] ? irq_exit+0x36/0x78
> [28522.643883]  [<ffffffff81004784>] ? do_IRQ+0x98/0xae
> [28522.643912]  [<ffffffff81562c13>] ? ret_from_intr+0x0/0xe
> [28522.643938]  <EOI>
> 
> [28522.643963]  [<ffffffff81009a41>] ? mwait_idle+0xb9/0xf3
> [28522.643991]  [<ffffffff81001c6e>] ? cpu_idle+0x57/0x8d
> [28522.644019]  [<ffffffff81801c49>] ? start_kernel+0x34e/0x35a
> [28522.644048]  [<ffffffff81801398>] ? x86_64_start_kernel+0xf3/0xf9
> [28522.644075] Code:
>   39  c9  76  18  44  29  c9  31  d2  44  89  c8  f7  f1  39  83  c0  03 
>   00  00
>   76  06  89  83  c0  03  00  00  8b  83  c0  03  00  00  31  d2  c1  e0 
>   04  0f
>   b7  4e  2c
>   f7>  f1  ba  01  00  00  00  85  c0  0f  45  d0  89  93  c0  03  00 
> 00  8b  b3
>   c0
> 
> [28522.644338] RIP
>   [<ffffffff8150b27b>] bictcp_cong_avoid+0x21a/0x247
> [28522.644371]  RSP <ffff8800bf403a90>
> [28522.644733] ---[ end trace 9db294ef7ff3a7b5 ]---
> [28522.644800] Kernel panic - not syncing: Fatal exception in interrupt
> [28522.644871] Pid: 0, comm: swapper Tainted: G      D     2.6.38.5 #6
> [28522.644942] Call Trace:
> [28522.645012]  <IRQ>
>   [<ffffffff81560690>] ? panic+0x9d/0x1a0
> [28522.645131]  [<ffffffff81562c13>] ? ret_from_intr+0x0/0xe
> [28522.645200]  [<ffffffff810365bb>] ? kmsg_dump+0x46/0xec
> [28522.645268]  [<ffffffff81006176>] ? oops_end+0x9f/0xac
> [28522.645335]  [<ffffffff810040d8>] ? do_divide_error+0x7f/0x89
> [28522.645404]  [<ffffffff8150b27b>] ? bictcp_cong_avoid+0x21a/0x247
> [28522.645473]  [<ffffffff814b057c>] ? dev_queue_xmit+0x4a4/0x4b2
> [28522.645545]  [<ffffffff814d5390>] ? ip_queue_xmit+0x2e9/0x32f
> [28522.645614]  [<ffffffff81003375>] ? divide_error+0x15/0x20
> [28522.645685]  [<ffffffff8150b27b>] ? bictcp_cong_avoid+0x21a/0x247
> [28522.645754]  [<ffffffff814e3dc5>] ? tcp_ack+0x18b5/0x1a89
> [28522.645823]  [<ffffffff814e45c2>] ? tcp_rcv_established+0xd1/0xa13
> [28522.645892]  [<ffffffff814ec60b>] ? tcp_v4_do_rcv+0x1b2/0x382
> [28522.645961]  [<ffffffff814c95d4>] ? nf_iterate+0x40/0x78
> [28522.646029]  [<ffffffff814ecc5f>] ? tcp_v4_rcv+0x484/0x797
> [28522.646097]  [<ffffffff814d11c7>] ? ip_local_deliver_finish+0xab/0x139
> [28522.646167]  [<ffffffff814ae2b3>] ? __netif_receive_skb+0x31c/0x349
> [28522.646240]  [<ffffffff814aec82>] ? netif_receive_skb+0x67/0x6d
> [28522.646308]  [<ffffffff814af1fb>] ? napi_gro_receive+0x9d/0xab
> [28522.646377]  [<ffffffff814aed57>] ? napi_skb_finish+0x1c/0x31
> [28522.646445]  [<ffffffff813e4248>] ? igb_poll+0x7d5/0xb2e
> [28522.646513]  [<ffffffff812b6b22>] ? blk_run_queue+0x23/0x37
> [28522.646582]  [<ffffffff813520d4>] ? scsi_run_queue+0x2ee/0x381
> [28522.646651]  [<ffffffff81353810>] ? scsi_io_completion+0x3e0/0x409
> [28522.646721]  [<ffffffff814af337>] ? net_rx_action+0xa7/0x212
> [28522.646791]  [<ffffffff8103b6c2>] ? __do_softirq+0xbe/0x184
> [28522.646884]  [<ffffffff8100364c>] ? call_softirq+0x1c/0x28
> [28522.646953]  [<ffffffff81005085>] ? do_softirq+0x31/0x63
> [28522.647021]  [<ffffffff8103b56c>] ? irq_exit+0x36/0x78
> [28522.647089]  [<ffffffff81004784>] ? do_IRQ+0x98/0xae
> [28522.647164]  [<ffffffff81562c13>] ? ret_from_intr+0x0/0xe
> [28522.647239]  <EOI>
>   [<ffffffff81009a41>] ? mwait_idle+0xb9/0xf3
> [28522.647354]  [<ffffffff81001c6e>] ? cpu_idle+0x57/0x8d
> [28522.647422]  [<ffffffff81801c49>] ? start_kernel+0x34e/0x35a
> [28522.647491]  [<ffffffff81801398>] ? x86_64_start_kernel+0xf3/0xf9
> --


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox