Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 4/7] net-tcp: Fast Open client - receiving SYN-ACK
From: Yuchung Cheng @ 2012-07-16 23:33 UTC (permalink / raw)
  To: Vijay Subramanian; +Cc: davem, hkchu, edumazet, ncardwell, sivasankar, netdev
In-Reply-To: <CAGK4HS9iEVy6NEx_dJgdEW8P2K9_G0vhziE+7O7iW+6WFc3VEw@mail.gmail.com>

On Mon, Jul 16, 2012 at 3:57 PM, Vijay Subramanian
<subramanian.vijay@gmail.com> wrote:
> On 16 July 2012 14:16, Yuchung Cheng <ycheng@google.com> wrote:
>> On receiving the SYN-ACK after SYN-data, the client needs to
>> a) update the cached MSS and cookie (if included in SYN-ACK)
>> b) retransmit the data yet acknowledged by the SYN-ACK in the final ACK of
>>    the handshake.
>
> I think you mean "data not yet acknowledged " in point (b).
yes "not yet". I will fix it in the next round after I collect more
reviews. Thanks!
>
> Vijay

^ permalink raw reply

* Re: [ethtool PATCH] ethtool: Resolve use of uninitialized memory in rxclass_get_dev_info
From: Alexander Duyck @ 2012-07-17  0:10 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, jeffrey.t.kirsher
In-Reply-To: <1342469017.2523.33.camel@bwh-desktop.uk.solarflarecom.com>

On 07/16/2012 01:03 PM, Ben Hutchings wrote:
> On Fri, 2012-07-13 at 09:55 -0700, Alexander Duyck wrote:
>> The ethtool function for getting the rule count was not zeroing out the
>> data field before passing it to the kernel.  As a result the value started
>> uninitialized and was incorrectly returning a result indicating that
>> devices supported setting new rule indexes.  In order to correct this I am
>> adding a one line fix that sets data to zero before we pass the command to
>> the kernel.
> Right.  For 'get' commands with no parameters (besides the device) the
> data copied back to userland is normally zero-initialised and then
> filled out by the driver, and I seem to have worked on that assumption.
> But because of the odd multiplexing of RX NFC commands
> ETHTOOL_GRXCLSRLCNT doesn't work like that.  And for 'my' driver that
> didn't matter.  Sorry about that.
>
> (We should really have some explicit documentation of responsibility for
> structure initialisation.)
>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> ---
>>
>> I am resending this since I didn't see any notification that it had been seen.
>> I also realized that I had not clearly identified that this is an ethtool user
>> space patch and not an ethtool kernel space patch.
> It was perfectly clear and I had queued it up to review but hadn't yet
> done so.
>
> Ben.
>
Yeah, that was my mistake.  I thought I hadn't sent it out with the
ethtool prefix when I actually had.

Thanks,

Alex

^ permalink raw reply

* [PATCH net-next] bnx2: Try to recover from PCI block reset
From: Michael Chan @ 2012-07-17  0:25 UTC (permalink / raw)
  To: davem; +Cc: netdev, nhorman

If the PCI block has reset, the memory enable bit will be reset and
the device will not respond to MMIO access.  bnx2_reset_task() currently
will not recover when this happens.  Add code to detect this condition
and restore the PCI state.  This scenario has been reported by some
users.

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index 0ced154..79cebd8 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -6388,6 +6388,7 @@ bnx2_reset_task(struct work_struct *work)
 {
 	struct bnx2 *bp = container_of(work, struct bnx2, reset_task);
 	int rc;
+	u16 pcicmd;
 
 	rtnl_lock();
 	if (!netif_running(bp->dev)) {
@@ -6397,6 +6398,12 @@ bnx2_reset_task(struct work_struct *work)
 
 	bnx2_netif_stop(bp, true);
 
+	pci_read_config_word(bp->pdev, PCI_COMMAND, &pcicmd);
+	if (!(pcicmd & PCI_COMMAND_MEMORY)) {
+		/* in case PCI block has reset */
+		pci_restore_state(bp->pdev);
+		pci_save_state(bp->pdev);
+	}
 	rc = bnx2_init_nic(bp, 1);
 	if (rc) {
 		netdev_err(bp->dev, "failed to reset NIC, closing\n");
-- 
1.7.1

^ permalink raw reply related

* Re: [RFC 1/2] PCI-Express Non-Transparent Bridge Support
From: Jon Mason @ 2012-07-17  0:23 UTC (permalink / raw)
  To: chetan loke; +Cc: linux-kernel, netdev, linux-pci, Dave Jiang
In-Reply-To: <CAAsGZS43FkfTFkYnvFdaA+KSLD1ZVBfuFe7_2e7s29_X+rTmug@mail.gmail.com>

On Mon, Jul 16, 2012 at 03:27:48PM -0400, chetan loke wrote:
> On Mon, Jul 16, 2012 at 2:38 PM, Jon Mason <jon.mason@intel.com> wrote:
> > On Mon, Jul 16, 2012 at 12:49:39PM -0400, chetan loke wrote:
> 
> ....
> 
> >> Is it ok to rename the following vars for convenience sake?
> >>
> >> > +       struct list_head txq;
> >> tx_pend_q - (pending_queue) or tx_out_q - (outstanding_queue) - or
> >> pick any new string you like - other than a mono-syllable definition
> >>
> >> > +       struct list_head txc;
> >> tx_compl_q - completion queue
> >>
> >> > +       struct list_head txe;
> >> tx_avail_e - available entry queue
> >>
> >>
> >> > +       spinlock_t txq_lock;
> >> > +       spinlock_t txc_lock;
> >> > +       spinlock_t txe_lock;
> >>
> >> then match the variants accordingly.
> >>
> >> > +       struct list_head rxq;
> >> > +       struct list_head rxc;
> >> > +       struct list_head rxe;
> >> > +       spinlock_t rxq_lock;
> >> > +       spinlock_t rxc_lock;
> >> > +       spinlock_t rxe_lock;
> >>
> >> similarly the rx-counterpart
> >
> > Are they difficult to understand?  I can change them, but it seems rather moot since you seemed to immediately understand the logic behind the names.
> >
> 
> Immediately understand? Not at first glance. I had to re-read the
> functions. Its really is a minor change and variables will then become
> self-explanatory. I can almost feel that a developer who works on this
> code for the first time might end up choosing the wrong 'syllable' and
> locking an entirely different lock.
> 
> Infact add a prefix 'ntb' to all the rx/tx locks. That way grep'ing
> output of lockstat also becomes easier.
> 
> It now reads: ntb_tx_pend_lock

Makes sense

> 
> >>
> >>
> >> ..................
> >>
> >> > +static void ntb_tx_copy_task(struct ntb_transport_qp *qp,
> >> > +                            struct ntb_queue_entry *entry,
> >> > +                            void *offset)
> >> > +{
> >> > +       struct ntb_payload_header *hdr = offset;
> >> > +       int rc;
> >> > +
> >> > +       offset += sizeof(struct ntb_payload_header);
> >> > +       memcpy_toio(offset, entry->buf, entry->len);
> >> > +
> >> > +       hdr->len = entry->len;
> >> > +       hdr->ver = qp->tx_pkts;
> >> > +
> >> > +       /* Ensure that the data is fully copied out before setting the flag */
> >> > +       wmb();
> >> > +       hdr->flags = entry->flags | DESC_DONE_FLAG;
> >> > +
> >> > +       rc = ntb_ring_sdb(qp->ndev, qp->qp_num);
> >> > +       if (rc)
> >> > +               pr_err("%s: error ringing db %d\n", __func__, qp->qp_num);
> >> > +
> >> > +       if (entry->len > 0) {
> >>
> >> how do you interpret this len variable and decide if it's a good/bad completion?
> >
> > The length of 0 is for messages from the remote system, which currently only consists of a "link down" message notifying the local system to no longer send any messages to the remote side.
> >
> 
> May be I didn't read the code properly. Is there a length-comment that
> explains the above? If not then just by pure code inspection it would
> seem that a skb was leaked. So add the above comment that way we can
> save time for other netdev folks too.

I'll add a comment.

> 
> 
> >>
> >> > +               qp->tx_bytes += entry->len;
> >> > +
> >> > +               /* Add fully transmitted data to completion queue */
> >> > +               ntb_list_add_tail(&qp->txc_lock, &entry->entry, &qp->txc);
> >> > +
> >> > +               if (qp->tx_handler)
> >> > +                       qp->tx_handler(qp);
> >> > +       } else
> >> > +               ntb_list_add_tail(&qp->txe_lock, &entry->entry, &qp->txe);
> >>
> >> I could be wrong but how is the original skb handled if the code path
> >> goes in the else clause?
> >
> > There is no skb is the length is zero.  Since the only client is virtual ethernet, it will always be > 60.  However, I should add a sanity check for 0 length in tx_enqueue.
> >
> >> Also, in the else clause, how about a ntb_list_add_head rather than a _tail.
> >
> > Why add to the head, it was just used?
> 
> Yes, just re-use what's hot(best guess).
> 
> >
> >> > +
> >> > +static int ntb_process_tx(struct ntb_transport_qp *qp,
> >> > +                         struct ntb_queue_entry *entry)
> >> > +{
> >> > +       struct ntb_payload_header *hdr;
> >> > +       void *offset;
> >> > +
> >> > +       offset = qp->tx_offset;
> >> > +       hdr = offset;
> >> > +
> >> > +       pr_debug("%lld - offset %p, tx %p, entry len %d flags %x buff %p\n",
> >> > +                qp->tx_pkts, offset, qp->tx_offset, entry->len, entry->flags,
> >> > +                entry->buf);
> >> > +       if (hdr->flags) {
> >> > +               ntb_list_add_head(&qp->txq_lock, &entry->entry, &qp->txq);
> >> > +               qp->tx_ring_full++;
> >> > +               return -EAGAIN;
> >> > +       }
> >> > +
> >> > +       if (entry->len > transport_mtu) {
> >> > +               pr_err("Trying to send pkt size of %d\n", entry->len);
> >> > +               entry->flags = HW_ERROR_FLAG;
> >> > +
> >> > +               ntb_list_add_tail(&qp->txc_lock, &entry->entry, &qp->txc);
> >> > +
> >> > +               if (qp->tx_handler)
> >> > +                       qp->tx_handler(qp);
> >> > +
> >> > +               return 0;
> >> > +       }
> >> > +
> >> > +       ntb_tx_copy_task(qp, entry, offset);
> >>
> >> what happens when ntb_sdb_ring returns an error? would you still want
> >> to increment tx_pkts below?
> >
> > It's not fatal if the remote side isn't notified.  It will hurt latency, since the next packet would be the one that triggers the next interrupt.  Also, the only way it could ever fail would be if it was an invalid interrupt bit being set, which should never happen.
> >
> 
> What happens when the 'db >= ndev->max_cbs' check fails? Under what
> circumstances will that happen? When it does happen how does the
> remote side then gets notified or should it even get notified?

It should never happen, as there are checks when enqueuing that the transport queue is valid.  It is more of a sanity check than anything else.

> 'which should never happen'? FYI - I have seen and debugged(not this
> one but doorbells and what not) weirdness while working on CLARiiON +
> PCie-interconnects. Board bring-up is a PITA. So you get the idea ...

I understand, that is why I had the sanity check code.  Paranoia can be a good thing.
 
> >>
> >> > +void *ntb_transport_tx_dequeue(struct ntb_transport_qp *qp, unsigned int *len)
> >> > +{
> >> > +       struct ntb_queue_entry *entry;
> >> > +       void *buf;
> >> > +
> >> > +       if (!qp)
> >> > +               return NULL;
> >> > +
> >> > +       entry = ntb_list_rm_head(&qp->txc_lock, &qp->txc);
> >> > +       if (!entry)
> >> > +               return NULL;
> >> > +
> >> > +       buf = entry->callback_data;
> >> > +       if (entry->flags != HW_ERROR_FLAG)
> >> > +               *len = entry->len;
> >> > +       else
> >> > +               *len = -EIO;
> >> > +
> >> > +       ntb_list_add_tail(&qp->txe_lock, &entry->entry, &qp->txe);
> >>
> >> how about a ntb_list_add_head?
> >
> > Is there a benefit to adding to the head versus tail?  It makes more sense to me to pull from the head and add to the tail.
> >
> 
> Yes, explained above. Today there are 100(..DEF_NUM...) entries.
> Tomorrow there could be more. So why not re-use what's hot? You could
> also think of this as yet another way of forcing detection of
> double-use corruption/bug. I'm not saying that there's a bug here but
> you get the idea.

I'm fairly sure the list is suboptimal performance-wise, but it is much easier to use when developing.  Perhaps I should use a fixed size array instead.

Thanks for the reviews!

> 
> Chetan Loke

^ permalink raw reply

* Re: [PATCH 1/4] pch_gbe: Fix the checksum fill to the error location
From: Paul Gortmaker @ 2012-07-17  0:59 UTC (permalink / raw)
  To: Andy Cress; +Cc: netdev
In-Reply-To: <40680C535D6FE6498883F1640FACD44D011432D4@ka-exchange-1.kontronamerica.local>

Hi Andy,

On Mon, Jul 16, 2012 at 4:03 PM, Andy Cress <andy.cress@us.kontron.com> wrote:
>
> Author: Zhong Hongbo <hongbo.zhong@windriver.com>

I'm curious as to how you are generating these mails, since if it
were with the "normal" method, then the above would have a
"From: " prefix, and not "Author: " prefix.  (More on what is the
"normal" method below...)

>
> Due to some unknown hardware limitations the pch_gbe hardware cannot
> calculate checksums when the length of network package is less
> than 64 bytes, where we will surprisingly encounter a problem of
> the destination IP incorrectly changed.

This has the feel of a problem that isn't properly understood, and
a solution that is aimed at masking the symptom at the point
where it rears its ugly head...  Not a good sign, when it comes to
requesting upstream acceptance.

>
> When forwarding network packages at the network layer the IP packages

s/packages/packets/

> won't be relayed to the upper transport layer and analyzed there,
> consequently, skb->transport_header pointer will be mistakenly remained
> the same as that of skb->network_header, resulting in TCP checksum
> wrongly
> filled into the field of destination IP in IP header.

Similarly here, the one-word-per-line symptom seems to hint at it
being a case of data manually entered into a GUI MUA, which
generally translates into headaches for the maintainer who has
to integrate your work.  If you can adopt a "standard" workflow,
it will be easier for both you and the maintainer.  And that
"standard" workflow could be something as simple as this:

--------------------------
cd /path/to/my/git-repo/of/net-next
git pull
git checkout -b pch-fixes master
<create commits, by cherry-pick or manual creation>
<validate with compilation, boot testing>
git format-patch --subject-prefix="PATCH net-next" --cover-letter -o
sendme master..pch-fixes
vi sendme/0000-cover-letter.patch
<enter text describing your series of patches>
git send-email --to netdev@vger.kernel.org -cc maintainer@someaddress.com sendme
-------------------------

>
> We can fix this issue by manually calculate the offset of the TCP
> checksum
>  and update it accordingly.
>
> We would normally use the skb_checksum_start_offset(skb) here, but in
> this
> case it is sometimes -2 (csum_start=0 - skb_headroom=2 => -2), hence the
> manual calculation.
>
> Signed-off-by: Zhong Hongbo <hongbo.zhong@windriver.com>
> Merged-by: Andy Cress <andy.cress@us.kontron.com>

There really isn't a precedent for Merged-by: -- you may want to see:

     http://kerneltrap.org/taxonomy/term/245

A "real" merge actually creates a merge commit, which is unique
in its own right.  So here, if you've carried forward an older patch,
or similar, you'd be most likely to add your own SOB line underneath.

Also a Cc: addition might be in order, say based on the output of:

./scripts/get_maintainer.pl -f drivers/net/ethernet/oki-semi/pch_gbe

A Cc: line put here goes on record.  If you use the --cc option
mentioned above, the Cc: record is limited to the mail header in
the netdev mail archive.

>
> ---
>  drivers/net/pch_gbe/pch_gbe_main.c |   14 ++++++++------
>  1 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
> b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
> index 3787c64..1642bff 100644
> --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
> +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
> @@ -1178,32 +1178,35 @@ static void pch_gbe_tx_queue(struct
> pch_gbe_adapter *adapter,
>         /*
>          * It is because the hardware accelerator does not support a
> checksum,
>          * when the received data size is less than 64 bytes.
> +        * Note: skb_checksum_start_offset(skb) is sometimes -2 here.

This comment is borderline useless, without some level of justification
of how/why that happens, and why it is OK/acceptable.

>          */
>         if (skb->len < PCH_GBE_SHORT_PKT && skb->ip_summed !=
> CHECKSUM_NONE) {
> +               struct iphdr *iph = ip_hdr(skb);
>                 frame_ctrl |= PCH_GBE_TXD_CTRL_APAD |
>                               PCH_GBE_TXD_CTRL_TCPIP_ACC_OFF;
>                 if (skb->protocol == htons(ETH_P_IP)) {
> -                       struct iphdr *iph = ip_hdr(skb);
>                         unsigned int offset;
> -                       offset = skb_transport_offset(skb);
> +                       offset = (unsigned char *)((u8 *)iph + iph->ihl
> * 4) - skb->data;
>                         if (iph->protocol == IPPROTO_TCP) {
> +                               struct tcphdr *tcphdr_point = (struct
> tcphdr *)((u8 *)iph + iph->ihl * 4);

There are several things going on here that are not ideal.  There
are casts of casts with unexplained math magic, there is the
attempt to encode variable types ("_point") in the variable name,
and there is apparently mailer munging which inserts rogue newlines.

At this point, I'd have to suggest that you go back to trying to
describe the use case, and the exact symptoms you are seeing
there; describe that and how you believe it comes about as
articulately as possible, and ask for input on how best to deal
with it; giving reference to this as a workaround that was deployed
some time in the past. In the end, this is a recipe for a good
commit log anyways, i.e. describing the end user symptom,
the use case that triggers it, the underlying reason as to why
it happens, and finally the technically correct fix to deal with
solving it (and indicate why it is the correct fix, if not obvious.)

Hope this helps with basic process and expectations.

Paul.
--

>                                 skb->csum = 0;
> -                               tcp_hdr(skb)->check = 0;
> +                               tcphdr_point->check = 0;
>                                 skb->csum = skb_checksum(skb, offset,
>                                                          skb->len -
> offset, 0);
> -                               tcp_hdr(skb)->check =
> +                               tcphdr_point->check =
>                                         csum_tcpudp_magic(iph->saddr,
>                                                           iph->daddr,
>                                                           skb->len -
> offset,
>                                                           IPPROTO_TCP,
>                                                           skb->csum);
>                         } else if (iph->protocol == IPPROTO_UDP) {
> +                               struct udphdr *udphdr_point = (struct
> udphdr *)((u8 *)iph + iph->ihl * 4);
>                                 skb->csum = 0;
> -                               udp_hdr(skb)->check = 0;
> +                               udphdr_point->check = 0;
>                                 skb->csum =
>                                         skb_checksum(skb, offset,
>                                                      skb->len - offset,
> 0);
> -                               udp_hdr(skb)->check =
> +                               udphdr_point->check =
>                                         csum_tcpudp_magic(iph->saddr,
>                                                           iph->daddr,
>                                                           skb->len -
> offset,
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [net] ixgbevf: Prevent RX/TX statistics getting reset to zero
From: Jeff Kirsher @ 2012-07-17  1:24 UTC (permalink / raw)
  To: davem; +Cc: Narendra K, netdev, gospo, sassmann, Jeff Kirsher

From: Narendra K <narendra_k@dell.com>

The commit 4197aa7bb81877ebb06e4f2cc1b5fea2da23a7bd implements 64 bit
per ring statistics. But the driver resets the 'total_bytes' and
'total_packets' from RX and TX rings in the RX and TX interrupt
handlers to zero. This results in statistics being lost and user space
reporting RX and TX statistics as zero. This patch addresses the
issue by preventing the resetting of RX and TX ring statistics to
zero.

Signed-off-by: Narendra K <narendra_k@dell.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   12 ------------
 1 file changed, 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index f69ec42..8b304a4 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -969,8 +969,6 @@ static irqreturn_t ixgbevf_msix_clean_tx(int irq, void *data)
 	r_idx = find_first_bit(q_vector->txr_idx, adapter->num_tx_queues);
 	for (i = 0; i < q_vector->txr_count; i++) {
 		tx_ring = &(adapter->tx_ring[r_idx]);
-		tx_ring->total_bytes = 0;
-		tx_ring->total_packets = 0;
 		ixgbevf_clean_tx_irq(adapter, tx_ring);
 		r_idx = find_next_bit(q_vector->txr_idx, adapter->num_tx_queues,
 				      r_idx + 1);
@@ -994,16 +992,6 @@ static irqreturn_t ixgbevf_msix_clean_rx(int irq, void *data)
 	struct ixgbe_hw *hw = &adapter->hw;
 	struct ixgbevf_ring  *rx_ring;
 	int r_idx;
-	int i;
-
-	r_idx = find_first_bit(q_vector->rxr_idx, adapter->num_rx_queues);
-	for (i = 0; i < q_vector->rxr_count; i++) {
-		rx_ring = &(adapter->rx_ring[r_idx]);
-		rx_ring->total_bytes = 0;
-		rx_ring->total_packets = 0;
-		r_idx = find_next_bit(q_vector->rxr_idx, adapter->num_rx_queues,
-				      r_idx + 1);
-	}
 
 	if (!q_vector->rxr_count)
 		return IRQ_HANDLED;
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH] ipv6: fix incorrect route 'expires' value passed to userspace.
From: Li Wei @ 2012-07-17  1:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev
In-Reply-To: <20120716094124.0040561f@s6510.linuxnetplumber.net>

于 2012-7-17 0:41, Stephen Hemminger 写道:
> On Mon, 16 Jul 2012 16:09:37 +0800
> Li Wei <lw@cn.fujitsu.com> wrote:
> 
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index becb048..a7fec9d 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -2516,7 +2516,7 @@ static int rt6_fill_node(struct net *net,
>>  		goto nla_put_failure;
>>  	if (!(rt->rt6i_flags & RTF_EXPIRES))
>>  		expires = 0;
>> -	else if (rt->dst.expires - jiffies < INT_MAX)
>> +	else if ((int)(rt->dst.expires - jiffies) < INT_MAX)
>>  		expires = rt->dst.expires - jiffies;
>>  	else
>>  		expires = INT_MAX;
> 
> Why not use time_is_after_jiffies() macro?

time_is_after_jiffies() return a bool but we need "how much time
before/after jiffies" here.

> 
> 

However, I also think these code seems a little ugly, because we
need to store the result of two "unsigned long"'s subtraction into
an integer. Maybe we should distinguish expires before and after 
jiffies to proper process the overflows.

Thanks,
Wei

^ permalink raw reply

* Re: [PATCH] ipv6: fix incorrect route 'expires' value passed to userspace.
From: Li Wei @ 2012-07-17  1:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, shemminger
In-Reply-To: <20120716.025649.1070277404591664104.davem@davemloft.net>

于 2012-7-16 17:56, David Miller 写道:
> 
> Without a proper signoff, I won't apply your patch.
> 
> 

Sorry for that, I will append a signoff, do some code refactor
and send a V2.

Thanks,
Wei

^ permalink raw reply

* [PATCH 0/4 v3 net-next] tg3: Add hwmon support
From: Michael Chan @ 2012-07-17  2:23 UTC (permalink / raw)
  To: davem; +Cc: netdev


David, I've removed the binary sysfs attribute and now use
hwmon only.  Please consider this patchset for net-next.

^ permalink raw reply

* [PATCH 4/4 v3 net-next] tg3: Add hwmon support for temperature
From: Michael Chan @ 2012-07-17  2:24 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1342491842-29818-4-git-send-email-mchan@broadcom.com>

Some tg3 devices have management firmware that can export sensor data.
Export temperature sensor reading via hwmon sysfs.

[hwmon interface suggested by Ben Hutchings <bhutchings@solarflare.com>]

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/tg3.c |  112 +++++++++++++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/tg3.h |   38 ++++++++++++
 2 files changed, 150 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 26ca368..fce4c1e 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -44,6 +44,10 @@
 #include <linux/prefetch.h>
 #include <linux/dma-mapping.h>
 #include <linux/firmware.h>
+#if IS_ENABLED(CONFIG_HWMON)
+#include <linux/hwmon.h>
+#include <linux/hwmon-sysfs.h>
+#endif
 
 #include <net/checksum.h>
 #include <net/ip.h>
@@ -9481,6 +9485,110 @@ static int tg3_init_hw(struct tg3 *tp, int reset_phy)
 	return tg3_reset_hw(tp, reset_phy);
 }
 
+#if IS_ENABLED(CONFIG_HWMON)
+static void tg3_sd_scan_scratchpad(struct tg3 *tp, struct tg3_ocir *ocir)
+{
+	int i;
+
+	for (i = 0; i < TG3_SD_NUM_RECS; i++, ocir++) {
+		u32 off = i * TG3_OCIR_LEN, len = TG3_OCIR_LEN;
+
+		tg3_ape_scratchpad_read(tp, (u32 *) ocir, off, len);
+		off += len;
+
+		if (ocir->signature != TG3_OCIR_SIG_MAGIC ||
+		    !(ocir->version_flags & TG3_OCIR_FLAG_ACTIVE))
+			memset(ocir, 0, TG3_OCIR_LEN);
+	}
+}
+
+/* sysfs attributes for hwmon */
+static ssize_t tg3_show_temp(struct device *dev,
+			     struct device_attribute *devattr, char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct tg3 *tp = netdev_priv(netdev);
+	struct sensor_device_attribute *attr = to_sensor_dev_attr(devattr);
+	u32 temperature;
+
+	spin_lock_bh(&tp->lock);
+	tg3_ape_scratchpad_read(tp, &temperature, attr->index,
+				sizeof(temperature));
+	spin_unlock_bh(&tp->lock);
+	return sprintf(buf, "%u\n", temperature);
+}
+
+
+static SENSOR_DEVICE_ATTR(temp1_input, S_IRUGO, tg3_show_temp, NULL,
+			  TG3_TEMP_SENSOR_OFFSET);
+static SENSOR_DEVICE_ATTR(temp1_crit, S_IRUGO, tg3_show_temp, NULL,
+			  TG3_TEMP_CAUTION_OFFSET);
+static SENSOR_DEVICE_ATTR(temp1_max, S_IRUGO, tg3_show_temp, NULL,
+			  TG3_TEMP_MAX_OFFSET);
+
+static struct attribute *tg3_attributes[] = {
+	&sensor_dev_attr_temp1_input.dev_attr.attr,
+	&sensor_dev_attr_temp1_crit.dev_attr.attr,
+	&sensor_dev_attr_temp1_max.dev_attr.attr,
+	NULL
+};
+
+static const struct attribute_group tg3_group = {
+	.attrs = tg3_attributes,
+};
+
+#endif
+
+static void tg3_hwmon_close(struct tg3 *tp)
+{
+#if IS_ENABLED(CONFIG_HWMON)
+	if (tp->hwmon_dev) {
+		hwmon_device_unregister(tp->hwmon_dev);
+		tp->hwmon_dev = NULL;
+		sysfs_remove_group(&tp->pdev->dev.kobj, &tg3_group);
+	}
+#endif
+}
+
+static void tg3_hwmon_open(struct tg3 *tp)
+{
+#if IS_ENABLED(CONFIG_HWMON)
+	int i, err;
+	u32 size = 0;
+	struct pci_dev *pdev = tp->pdev;
+	struct tg3_ocir ocirs[TG3_SD_NUM_RECS];
+
+	tg3_sd_scan_scratchpad(tp, ocirs);
+
+	for (i = 0; i < TG3_SD_NUM_RECS; i++) {
+		if (!ocirs[i].src_data_length)
+			continue;
+
+		size += ocirs[i].src_hdr_length;
+		size += ocirs[i].src_data_length;
+	}
+
+	if (!size)
+		return;
+
+	/* Register hwmon sysfs hooks */
+	err = sysfs_create_group(&pdev->dev.kobj, &tg3_group);
+	if (err) {
+		dev_err(&pdev->dev, "Cannot create sysfs group, aborting\n");
+		return;
+	}
+
+	tp->hwmon_dev = hwmon_device_register(&pdev->dev);
+	if (IS_ERR(tp->hwmon_dev)) {
+		tp->hwmon_dev = NULL;
+		dev_err(&pdev->dev, "Cannot register hwmon device, aborting\n");
+		sysfs_remove_group(&pdev->dev.kobj, &tg3_group);
+	}
+#endif
+}
+
+
 #define TG3_STAT_ADD32(PSTAT, REG) \
 do {	u32 __val = tr32(REG); \
 	(PSTAT)->low += __val; \
@@ -10189,6 +10297,8 @@ static int tg3_open(struct net_device *dev)
 
 	tg3_phy_start(tp);
 
+	tg3_hwmon_open(tp);
+
 	tg3_full_lock(tp, 0);
 
 	tg3_timer_start(tp);
@@ -10238,6 +10348,8 @@ static int tg3_close(struct net_device *dev)
 
 	tg3_timer_stop(tp);
 
+	tg3_hwmon_close(tp);
+
 	tg3_phy_stop(tp);
 
 	tg3_full_lock(tp, 1);
diff --git a/drivers/net/ethernet/broadcom/tg3.h b/drivers/net/ethernet/broadcom/tg3.h
index f8a0d9c..a1b75cd 100644
--- a/drivers/net/ethernet/broadcom/tg3.h
+++ b/drivers/net/ethernet/broadcom/tg3.h
@@ -2676,6 +2676,40 @@ struct tg3_hw_stats {
 	u8				__reserved4[0xb00-0x9c8];
 };
 
+#define TG3_SD_NUM_RECS			3
+#define TG3_OCIR_LEN			(sizeof(struct tg3_ocir))
+#define TG3_OCIR_SIG_MAGIC		0x5253434f
+#define TG3_OCIR_FLAG_ACTIVE		0x00000001
+
+#define TG3_TEMP_CAUTION_OFFSET		0xc8
+#define TG3_TEMP_MAX_OFFSET		0xcc
+#define TG3_TEMP_SENSOR_OFFSET		0xd4
+
+
+struct tg3_ocir {
+	u32				signature;
+	u16				version_flags;
+	u16				refresh_int;
+	u32				refresh_tmr;
+	u32				update_tmr;
+	u32				dst_base_addr;
+	u16				src_hdr_offset;
+	u16				src_hdr_length;
+	u16				src_data_offset;
+	u16				src_data_length;
+	u16				dst_hdr_offset;
+	u16				dst_data_offset;
+	u16				dst_reg_upd_offset;
+	u16				dst_sem_offset;
+	u32				reserved1[2];
+	u32				port0_flags;
+	u32				port1_flags;
+	u32				port2_flags;
+	u32				port3_flags;
+	u32				reserved2[1];
+};
+
+
 /* 'mapping' is superfluous as the chip does not write into
  * the tx/rx post rings so we could just fetch it from there.
  * But the cache behavior is better how we are doing it now.
@@ -3211,6 +3245,10 @@ struct tg3 {
 	const char			*fw_needed;
 	const struct firmware		*fw;
 	u32				fw_len; /* includes BSS */
+
+#if IS_ENABLED(CONFIG_HWMON)
+	struct device			*hwmon_dev;
+#endif
 };
 
 #endif /* !(_T3_H) */
-- 
1.7.1

^ permalink raw reply related

* [PATCH 3/4 v3 net-next] tg3: Add APE scratchpad read function
From: Michael Chan @ 2012-07-17  2:24 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1342491842-29818-3-git-send-email-mchan@broadcom.com>

From: Matt Carlson <mcarlson@broadcom.com>

for retreiving temperature sensor data.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/tg3.c |   79 +++++++++++++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/tg3.h |    9 +++-
 2 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 5dda3ce..26ca368 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -752,6 +752,85 @@ static int tg3_ape_event_lock(struct tg3 *tp, u32 timeout_us)
 	return timeout_us ? 0 : -EBUSY;
 }
 
+static int tg3_ape_wait_for_event(struct tg3 *tp, u32 timeout_us)
+{
+	u32 i, apedata;
+
+	for (i = 0; i < timeout_us / 10; i++) {
+		apedata = tg3_ape_read32(tp, TG3_APE_EVENT_STATUS);
+
+		if (!(apedata & APE_EVENT_STATUS_EVENT_PENDING))
+			break;
+
+		udelay(10);
+	}
+
+	return i == timeout_us / 10;
+}
+
+int tg3_ape_scratchpad_read(struct tg3 *tp, u32 *data, u32 base_off, u32 len)
+{
+	int err;
+	u32 i, bufoff, msgoff, maxlen, apedata;
+
+	if (!tg3_flag(tp, APE_HAS_NCSI))
+		return 0;
+
+	apedata = tg3_ape_read32(tp, TG3_APE_SEG_SIG);
+	if (apedata != APE_SEG_SIG_MAGIC)
+		return -ENODEV;
+
+	apedata = tg3_ape_read32(tp, TG3_APE_FW_STATUS);
+	if (!(apedata & APE_FW_STATUS_READY))
+		return -EAGAIN;
+
+	bufoff = tg3_ape_read32(tp, TG3_APE_SEG_MSG_BUF_OFF) +
+		 TG3_APE_SHMEM_BASE;
+	msgoff = bufoff + 2 * sizeof(u32);
+	maxlen = tg3_ape_read32(tp, TG3_APE_SEG_MSG_BUF_LEN);
+
+	while (len) {
+		u32 length;
+
+		/* Cap xfer sizes to scratchpad limits. */
+		length = (len > maxlen) ? maxlen : len;
+		len -= length;
+
+		apedata = tg3_ape_read32(tp, TG3_APE_FW_STATUS);
+		if (!(apedata & APE_FW_STATUS_READY))
+			return -EAGAIN;
+
+		/* Wait for up to 1 msec for APE to service previous event. */
+		err = tg3_ape_event_lock(tp, 1000);
+		if (err)
+			return err;
+
+		apedata = APE_EVENT_STATUS_DRIVER_EVNT |
+			  APE_EVENT_STATUS_SCRTCHPD_READ |
+			  APE_EVENT_STATUS_EVENT_PENDING;
+		tg3_ape_write32(tp, TG3_APE_EVENT_STATUS, apedata);
+
+		tg3_ape_write32(tp, bufoff, base_off);
+		tg3_ape_write32(tp, bufoff + sizeof(u32), length);
+
+		tg3_ape_unlock(tp, TG3_APE_LOCK_MEM);
+		tg3_ape_write32(tp, TG3_APE_EVENT, APE_EVENT_1);
+
+		base_off += length;
+
+		if (tg3_ape_wait_for_event(tp, 30000))
+			return -EAGAIN;
+
+		for (i = 0; length; i += 4, length -= 4) {
+			u32 val = tg3_ape_read32(tp, msgoff + i);
+			memcpy(data, &val, sizeof(u32));
+			data++;
+		}
+	}
+
+	return 0;
+}
+
 static int tg3_ape_send_event(struct tg3 *tp, u32 event)
 {
 	int err;
diff --git a/drivers/net/ethernet/broadcom/tg3.h b/drivers/net/ethernet/broadcom/tg3.h
index 93865f8..f8a0d9c 100644
--- a/drivers/net/ethernet/broadcom/tg3.h
+++ b/drivers/net/ethernet/broadcom/tg3.h
@@ -2311,10 +2311,11 @@
 #define  APE_LOCK_REQ_DRIVER		 0x00001000
 #define TG3_APE_LOCK_GRANT		0x004c
 #define  APE_LOCK_GRANT_DRIVER		 0x00001000
-#define TG3_APE_SEG_SIG			0x4000
-#define  APE_SEG_SIG_MAGIC		 0x41504521
 
 /* APE shared memory.  Accessible through BAR1 */
+#define TG3_APE_SHMEM_BASE		0x4000
+#define TG3_APE_SEG_SIG			0x4000
+#define  APE_SEG_SIG_MAGIC		 0x41504521
 #define TG3_APE_FW_STATUS		0x400c
 #define  APE_FW_STATUS_READY		 0x00000100
 #define TG3_APE_FW_FEATURES		0x4010
@@ -2327,6 +2328,8 @@
 #define  APE_FW_VERSION_REVMSK		 0x0000ff00
 #define  APE_FW_VERSION_REVSFT		 8
 #define  APE_FW_VERSION_BLDMSK		 0x000000ff
+#define TG3_APE_SEG_MSG_BUF_OFF		0x401c
+#define TG3_APE_SEG_MSG_BUF_LEN		0x4020
 #define TG3_APE_HOST_SEG_SIG		0x4200
 #define  APE_HOST_SEG_SIG_MAGIC		 0x484f5354
 #define TG3_APE_HOST_SEG_LEN		0x4204
@@ -2353,6 +2356,8 @@
 
 #define  APE_EVENT_STATUS_DRIVER_EVNT	 0x00000010
 #define  APE_EVENT_STATUS_STATE_CHNGE	 0x00000500
+#define  APE_EVENT_STATUS_SCRTCHPD_READ	 0x00001600
+#define  APE_EVENT_STATUS_SCRTCHPD_WRITE 0x00001700
 #define  APE_EVENT_STATUS_STATE_START	 0x00010000
 #define  APE_EVENT_STATUS_STATE_UNLOAD	 0x00020000
 #define  APE_EVENT_STATUS_STATE_WOL	 0x00030000
-- 
1.7.1

^ permalink raw reply related

* [PATCH 2/4 v3 net-next] tg3: Add common function tg3_ape_event_lock()
From: Michael Chan @ 2012-07-17  2:24 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1342491842-29818-2-git-send-email-mchan@broadcom.com>

From: Matt Carlson <mcarlson@broadcom.com>

by refactoring code in tg3_ape_send_event().  The common function will
be used in subsequent patches.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/tg3.c |   56 ++++++++++++++++++++---------------
 1 files changed, 32 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 45dc6f5..5dda3ce 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -731,44 +731,52 @@ static void tg3_ape_unlock(struct tg3 *tp, int locknum)
 	tg3_ape_write32(tp, gnt + 4 * locknum, bit);
 }
 
-static void tg3_ape_send_event(struct tg3 *tp, u32 event)
+static int tg3_ape_event_lock(struct tg3 *tp, u32 timeout_us)
 {
-	int i;
 	u32 apedata;
 
-	/* NCSI does not support APE events */
-	if (tg3_flag(tp, APE_HAS_NCSI))
-		return;
+	while (timeout_us) {
+		if (tg3_ape_lock(tp, TG3_APE_LOCK_MEM))
+			return -EBUSY;
+
+		apedata = tg3_ape_read32(tp, TG3_APE_EVENT_STATUS);
+		if (!(apedata & APE_EVENT_STATUS_EVENT_PENDING))
+			break;
+
+		tg3_ape_unlock(tp, TG3_APE_LOCK_MEM);
+
+		udelay(10);
+		timeout_us -= (timeout_us > 10) ? 10 : timeout_us;
+	}
+
+	return timeout_us ? 0 : -EBUSY;
+}
+
+static int tg3_ape_send_event(struct tg3 *tp, u32 event)
+{
+	int err;
+	u32 apedata;
 
 	apedata = tg3_ape_read32(tp, TG3_APE_SEG_SIG);
 	if (apedata != APE_SEG_SIG_MAGIC)
-		return;
+		return -EAGAIN;
 
 	apedata = tg3_ape_read32(tp, TG3_APE_FW_STATUS);
 	if (!(apedata & APE_FW_STATUS_READY))
-		return;
+		return -EAGAIN;
 
 	/* Wait for up to 1 millisecond for APE to service previous event. */
-	for (i = 0; i < 10; i++) {
-		if (tg3_ape_lock(tp, TG3_APE_LOCK_MEM))
-			return;
-
-		apedata = tg3_ape_read32(tp, TG3_APE_EVENT_STATUS);
-
-		if (!(apedata & APE_EVENT_STATUS_EVENT_PENDING))
-			tg3_ape_write32(tp, TG3_APE_EVENT_STATUS,
-					event | APE_EVENT_STATUS_EVENT_PENDING);
+	err = tg3_ape_event_lock(tp, 1000);
+	if (err)
+		return err;
 
-		tg3_ape_unlock(tp, TG3_APE_LOCK_MEM);
+	tg3_ape_write32(tp, TG3_APE_EVENT_STATUS,
+			event | APE_EVENT_STATUS_EVENT_PENDING);
 
-		if (!(apedata & APE_EVENT_STATUS_EVENT_PENDING))
-			break;
+	tg3_ape_unlock(tp, TG3_APE_LOCK_MEM);
+	tg3_ape_write32(tp, TG3_APE_EVENT, APE_EVENT_1);
 
-		udelay(100);
-	}
-
-	if (!(apedata & APE_EVENT_STATUS_EVENT_PENDING))
-		tg3_ape_write32(tp, TG3_APE_EVENT, APE_EVENT_1);
+	return 0;
 }
 
 static void tg3_ape_driver_state_change(struct tg3 *tp, int kind)
-- 
1.7.1

^ permalink raw reply related

* [PATCH 1/4 v3 net-next] tg3: Fix the setting of the APE_HAS_NCSI flag
From: Michael Chan @ 2012-07-17  2:23 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1342491842-29818-1-git-send-email-mchan@broadcom.com>

The driver currently skips setting this flag if the VPD contains the
firmware version string.  We fix this by separating the probing of NCSI
from the reading of the NCSI version string.  The APE_HAS_NCSI flag is
needed to properly read sensor data.

Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/tg3.c |   42 +++++++++++++++++-----------------
 1 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index ac9091f..45dc6f5 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -13858,14 +13858,9 @@ static void __devinit tg3_read_mgmtfw_ver(struct tg3 *tp)
 	}
 }
 
-static void __devinit tg3_read_dash_ver(struct tg3 *tp)
+static void __devinit tg3_probe_ncsi(struct tg3 *tp)
 {
-	int vlen;
 	u32 apedata;
-	char *fwtype;
-
-	if (!tg3_flag(tp, ENABLE_APE) || !tg3_flag(tp, ENABLE_ASF))
-		return;
 
 	apedata = tg3_ape_read32(tp, TG3_APE_SEG_SIG);
 	if (apedata != APE_SEG_SIG_MAGIC)
@@ -13875,14 +13870,22 @@ static void __devinit tg3_read_dash_ver(struct tg3 *tp)
 	if (!(apedata & APE_FW_STATUS_READY))
 		return;
 
+	if (tg3_ape_read32(tp, TG3_APE_FW_FEATURES) & TG3_APE_FW_FEATURE_NCSI)
+		tg3_flag_set(tp, APE_HAS_NCSI);
+}
+
+static void __devinit tg3_read_dash_ver(struct tg3 *tp)
+{
+	int vlen;
+	u32 apedata;
+	char *fwtype;
+
 	apedata = tg3_ape_read32(tp, TG3_APE_FW_VERSION);
 
-	if (tg3_ape_read32(tp, TG3_APE_FW_FEATURES) & TG3_APE_FW_FEATURE_NCSI) {
-		tg3_flag_set(tp, APE_HAS_NCSI);
+	if (tg3_flag(tp, APE_HAS_NCSI))
 		fwtype = "NCSI";
-	} else {
+	else
 		fwtype = "DASH";
-	}
 
 	vlen = strlen(tp->fw_ver);
 
@@ -13916,20 +13919,17 @@ static void __devinit tg3_read_fw_ver(struct tg3 *tp)
 		tg3_read_sb_ver(tp, val);
 	else if ((val & TG3_EEPROM_MAGIC_HW_MSK) == TG3_EEPROM_MAGIC_HW)
 		tg3_read_hwsb_ver(tp);
-	else
-		return;
 
-	if (vpd_vers)
-		goto done;
-
-	if (tg3_flag(tp, ENABLE_APE)) {
-		if (tg3_flag(tp, ENABLE_ASF))
-			tg3_read_dash_ver(tp);
-	} else if (tg3_flag(tp, ENABLE_ASF)) {
-		tg3_read_mgmtfw_ver(tp);
+	if (tg3_flag(tp, ENABLE_ASF)) {
+		if (tg3_flag(tp, ENABLE_APE)) {
+			tg3_probe_ncsi(tp);
+			if (!vpd_vers)
+				tg3_read_dash_ver(tp);
+		} else if (!vpd_vers) {
+			tg3_read_mgmtfw_ver(tp);
+		}
 	}
 
-done:
 	tp->fw_ver[TG3_VER_SIZE - 1] = 0;
 }
 
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] net-next: make sock diag per-namespace (v2)
From: Pavel Emelyanov @ 2012-07-17  2:34 UTC (permalink / raw)
  To: Andrew Vagin, David S. Miller
  Cc: Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, Eric Dumazet, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <1342448929-1809316-1-git-send-email-avagin@openvz.org>

On 07/16/2012 06:28 PM, Andrew Vagin wrote:
> Before this patch sock_diag works for init_net only and dumps
> information about sockets from all namespaces.
> 
> This patch expands sock_diag for all name-spaces.
> It creates a netlink kernel socket for each netns and filters
> data during dumping.
> 
> v2: filter accoding with netns in all places
>     remove an unused variable.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: James Morris <jmorris@namei.org>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Patrick McHardy <kaber@trash.net>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> CC: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Andrew Vagin <avagin@openvz.org>

Acked-by: Pavel Emelyanov <xemul@parallels.com>

^ permalink raw reply

* Re: [PATCH 2/3] ipvs: add missing lock in ip_vs_ftp_init_conn()
From: Xiaotian Feng @ 2012-07-17  2:34 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Simon Horman, lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Hans Schillstrom, Jesper Dangaard Brouer,
	Xiaotian Feng, Patrick McHardy, David S. Miller
In-Reply-To: <20120716210757.GA18026@1984>

On Tue, Jul 17, 2012 at 5:07 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Simon,
>
> On Wed, Jul 11, 2012 at 09:19:22AM +0900, Simon Horman wrote:
>> From: Xiaotian Feng <xtfeng@gmail.com>
>>
>> We met a kernel panic in 2.6.32.43 kernel:
> [...]
>>  net/netfilter/ipvs/ip_vs_ftp.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
>> index b20b29c..c2bc264 100644
>> --- a/net/netfilter/ipvs/ip_vs_ftp.c
>> +++ b/net/netfilter/ipvs/ip_vs_ftp.c
>> @@ -65,8 +65,10 @@ static int ip_vs_ftp_pasv;
>>  static int
>>  ip_vs_ftp_init_conn(struct ip_vs_app *app, struct ip_vs_conn *cp)
>>  {
>> +     spin_lock(&cp->lock);
>>       /* We use connection tracking for the command connection */
>>       cp->flags |= IP_VS_CONN_F_NFCT;
>> +     spin_unlock(&cp->lock);
>>       return 0;
>
> The conntrack support for FTP IPVS helper seems to be there since
> 2.6.37.
>
> However, the patch description mentions 2.6.32.43.
>
> Something doesn't match here, could you clarify this?
>

Sorry for the misleading description in the patch. We found the panic
in 2.6.32.43 is caused by changing cp->flags without protection. In
2.6.32.43, ip_vs_process_message changes cp->flags without protection
while update active/inactive flags for the connection.

After code inspiration, we found in 3.x kernel, it is accidentally
fixed by commit  f73181c. But with ip_vs_app changes,
ip_vs_ftp_init_conn() will have chance to change cp->flags without
protection. So it is a potential bug in 3.x kernel.


> Thanks.

^ permalink raw reply

* Re: 82571EB: Detected Hardware Unit Hang
From: Jon Mason @ 2012-07-17  4:45 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Henrique de Moraes Holschuh, Dave, Tushar N, Joe Jin,
	e1000-devel@lists.sf.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <1342453645.2523.17.camel@bwh-desktop.uk.solarflarecom.com>

On Mon, Jul 16, 2012 at 8:47 AM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Sun, 2012-07-15 at 10:35 -0300, Henrique de Moraes Holschuh wrote:
>> On Sun, 15 Jul 2012, Dave, Tushar N wrote:
>> > Somehow setting max payload to 256 from BIOS does not set this value for all devices. I believe this is a BIOS bug.
>>
>> And preferably, Linux should complain about it.  Since we know it is going
>> to cause problems, and since we know it does happen, we should be raising a
>> ruckus about it in the kernel log (and probably fixing it to min(path) while
>> at it)...
>>
>> Is this something that should be raised as a feature request with the
>> PCI/PCIe subsystem?
>
> The feature is there, but we ended up with:
>
> commit 5f39e6705faade2e89d119958a8c51b9b6e2c53c
> Author: Jon Mason <mason@myri.com>
> Date:   Mon Oct 3 09:50:20 2011 -0500
>
>     PCI: Disable MPS configuration by default

With the quirk, it should work now if pcie_bus_config is set to
PCIE_BUS_SAFE.  With that patch was pushed it was too late in the
release to fix it and see if there were any other ones out there (or
incur the wrath of Linus).  If you are brave enough, you can enable it
by default again and see if there are any other quirks out there. ;-)

> But you are welcome to share use of the fixup_mpss_256() quirk.
>
> Ben.
>
> --
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: 82571EB: Detected Hardware Unit Hang
From: Jon Mason @ 2012-07-17  4:48 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh
  Cc: Ben Hutchings, Dave, Tushar N, Joe Jin, e1000-devel@lists.sf.net,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20120716160807.GB1765@khazad-dum.debian.net>

On Mon, Jul 16, 2012 at 9:08 AM, Henrique de Moraes Holschuh
<hmh@hmh.eng.br> wrote:
> On Mon, 16 Jul 2012, Ben Hutchings wrote:
>> On Sun, 2012-07-15 at 10:35 -0300, Henrique de Moraes Holschuh wrote:
>> > On Sun, 15 Jul 2012, Dave, Tushar N wrote:
>> > > Somehow setting max payload to 256 from BIOS does not set this value for all devices. I believe this is a BIOS bug.
>> >
>> > And preferably, Linux should complain about it.  Since we know it is going
>> > to cause problems, and since we know it does happen, we should be raising a
>> > ruckus about it in the kernel log (and probably fixing it to min(path) while
>> > at it)...
>> >
>> > Is this something that should be raised as a feature request with the
>> > PCI/PCIe subsystem?
>>
>> The feature is there, but we ended up with:
>>
>> commit 5f39e6705faade2e89d119958a8c51b9b6e2c53c
>> Author: Jon Mason <mason@myri.com>
>> Date:   Mon Oct 3 09:50:20 2011 -0500
>>
>>     PCI: Disable MPS configuration by default
>>
>> But you are welcome to share use of the fixup_mpss_256() quirk.
>
> Meh.  I'd be happy with a warning if MPSS decreases when walking up to
> the tree root... i.e. a warning if any child has a MPSS larger than the
> parent.

You can add "pci=pcie_bus_safe" to the kernel params and it should
resolve your issue.

> --
>   "One disk to rule them all, One disk to find them. One disk to bring
>   them all and in the darkness grind them. In the Land of Redmond
>   where the shadows lie." -- The Silicon Valley Tarot
>   Henrique Holschuh
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: timer expiry check at icmp.c in ipv6
From: BALAKUMARAN KANNAN @ 2012-07-17  5:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org
In-Reply-To: <1342449217.2830.3.camel@edumazet-glaptop>

Dear Eric,
    Can you please explain it into bit further. Actually what I want to know is how the preferred lifetime value of a router advertisement is processed?, Where it is stored? and while sending the ICMP_REPLY how that preferred lifetime value considered before actually sending the packet? I cannot able to find the relation between icmpv6_echo_reply and fib6_run_gc(). I am very new to kernel source. Kindly help me.

thank you.
________________________________________
From: netdev-owner@vger.kernel.org [netdev-owner@vger.kernel.org] on behalf of Eric Dumazet [eric.dumazet@gmail.com]
Sent: Monday, July 16, 2012 8:03 PM
To: BALAKUMARAN KANNAN
Cc: netdev@vger.kernel.org
Subject: Re: timer expiry check at icmp.c in ipv6

On Mon, 2012-07-16 at 13:44 +0000, BALAKUMARAN KANNAN wrote:
> Dear all,
>
> In kernel-3.0.26 code net/ipv6/icmp.c, while sending ICMP reply where
> it checks for the timer expiry. It should check the value given by a
> router advertisement. I think the expiry value is stored in
> rt->rt6_expires in ndisc.c (line no: 1284). Then while sending an ICMP
> reply, it should check with the expiry timer right? Where that check
> is happening? Please somebody explain me.
>
> Thank you.

Its probably done in net/ipv6/ip6_fib.c

fib6_gc_timer_cb() -> fib6_run_gc()

Every 30 seconds.

(you can change /proc/sys/net/ipv6/route/gc_interval )




^ permalink raw reply

* Re: [PATCH] netem: fix rate extension and drop accounting
From: Eric Dumazet @ 2012-07-17  5:12 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Mark Gordon, David Miller, netdev, Yuchung Cheng, Andreas Terzis
In-Reply-To: <20120716232600.GI3415@nuttenaction>

On Tue, 2012-07-17 at 01:26 +0200, Hagen Paul Pfeifer wrote:

> As Eric said: there are problems in combination with a static delay. During
> rate extension development we tested the raw/vanilla "rate" functionality.
> The rate part works faultless[TM] - at least independet of any other
> "delay-latency generator".

Well, to get correct rate, I use the following unofficial patch :

(Or else, the rate is really wrong)



diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index c412ad0..2740a75 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -433,16 +433,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 			delay += packet_len_2_sched_time(skb->len, q);
 
-			if (!skb_queue_empty(list)) {
-				/*
-				 * Last packet in queue is reference point (now).
-				 * First packet in queue is already in flight,
-				 * calculate this time bonus and substract
-				 * from delay.
-				 */
-				delay -= now - netem_skb_cb(skb_peek(list))->time_to_send;
+			if (!skb_queue_empty(list))
 				now = netem_skb_cb(skb_peek_tail(list))->time_to_send;
-			}
 		}
 
 		cb->time_to_send = now + delay;

^ permalink raw reply related

* Re: [PATCH net-next] tcp: add OFO snmp counters
From: David Miller @ 2012-07-17  5:12 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1342438896.23494.15.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 16 Jul 2012 13:41:36 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> Add three SNMP TCP counters, to better track TCP behavior
> at global stage (netstat -s), when packets are received
> Out Of Order (OFO)
> 
> TCPOFOQueue : Number of packets queued in OFO queue
> 
> TCPOFODrop  : Number of packets meant to be queued in OFO
>               but dropped because socket rcvbuf limit hit.
> 
> TCPOFOMerge : Number of packets in OFO that were merged with
>               other packets.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Looks good, applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH 2/3] ipvs: add missing lock in ip_vs_ftp_init_conn()
From: Simon Horman @ 2012-07-17  5:14 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Hans Schillstrom, Jesper Dangaard Brouer,
	Xiaotian Feng, Xiaotian Feng, Patrick McHardy, David S. Miller
In-Reply-To: <20120716210757.GA18026@1984>

On Mon, Jul 16, 2012 at 11:07:57PM +0200, Pablo Neira Ayuso wrote:
> Hi Simon,
> 
> On Wed, Jul 11, 2012 at 09:19:22AM +0900, Simon Horman wrote:
> > From: Xiaotian Feng <xtfeng@gmail.com>
> > 
> > We met a kernel panic in 2.6.32.43 kernel:
> [...]
> >  net/netfilter/ipvs/ip_vs_ftp.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
> > index b20b29c..c2bc264 100644
> > --- a/net/netfilter/ipvs/ip_vs_ftp.c
> > +++ b/net/netfilter/ipvs/ip_vs_ftp.c
> > @@ -65,8 +65,10 @@ static int ip_vs_ftp_pasv;
> >  static int
> >  ip_vs_ftp_init_conn(struct ip_vs_app *app, struct ip_vs_conn *cp)
> >  {
> > +	spin_lock(&cp->lock);
> >  	/* We use connection tracking for the command connection */
> >  	cp->flags |= IP_VS_CONN_F_NFCT;
> > +	spin_unlock(&cp->lock);
> >  	return 0;
> 
> The conntrack support for FTP IPVS helper seems to be there since
> 2.6.37.
> 
> However, the patch description mentions 2.6.32.43.
> 
> Something doesn't match here, could you clarify this?

My understanding is that the problem was observed with 2.6.32
but that it has been around for much longer.

^ permalink raw reply

* Re: [RESEND] lpc_eth: remove duplicated include
From: David Miller @ 2012-07-17  5:17 UTC (permalink / raw)
  To: djduanjiong; +Cc: netdev
In-Reply-To: <50040911.5050107@gmail.com>

From: Duan Jiong <djduanjiong@gmail.com>
Date: Mon, 16 Jul 2012 20:29:05 +0800

> Remove duplicated #include <linux/delay.h> in
> drivers/net/ethernet/nxp/lpc_eth.c
> 
> Signed-off-by: Duan Jiong<djduanjiong@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: Is TCP vulneribility patch (as in RFC 5961) done in linux?
From: David Miller @ 2012-07-17  5:20 UTC (permalink / raw)
  To: kkiran; +Cc: eric.dumazet, netdev
In-Reply-To: <68700EDA775E5E47B5EBA9FF8AC0F15C078886@SJEXCHMB09.corp.ad.broadcom.com>


Please do not top-post.

Please do not quote an entire patch in a reply because it:

1) Wastes bandwidth, everyone on the list now has to receive another
   copy of the patch.  Actually, because you did this twice, people
   have to receive 3 total copies.

2) It causes confusion in our patch management database, because now
   I have to sift through and remove the two extra copies of this
   patch your emails added.

^ permalink raw reply

* Re: [PATCH] natsemi: make cable length magic configurable
From: David Miller @ 2012-07-17  5:22 UTC (permalink / raw)
  To: broonie; +Cc: bhutchings, jdelvare, netdev, thockin, okir
In-Reply-To: <20120716135740.GB22600@sirena.org.uk>

From: Mark Brown <broonie@opensource.wolfsonmicro.com>
Date: Mon, 16 Jul 2012 14:57:40 +0100

> On Mon, Jul 16, 2012 at 02:08:54PM +0100, Ben Hutchings wrote:
>> On Mon, 2012-07-16 at 14:26 +0200, Jean Delvare wrote:
> 
>> > Furthermore I don't quite get why we can't just go with the module
>> > parameter. As I understand it, this is a crappy driver for crappy, rare
>> > hardware. The driver already has a module parameter to work around a
>> > hardware bug (dspcfg_workaround), I don't quite see why adding a second
>> > one would be a problem. At least it is consistent.
> 
>> David can be quite insistent about finding an alternative to module
>> parameters.
> 
> The hardware isn't that rare or bad - it's pretty widely deployed in
> 100Mbit systems (including lots of embedded ones) and performs well for
> the systems it's targetting.  You'd not use it for a modern server but
> if what you need is 100M it's a fairly good part.

I don't want to hear any excuses for not implementing a configuration
interface properly.  I'm sure if I allowed it, everyone would be able
to come up with a special set of circumstances which justified the
module parameter.

Sorry, no.

^ permalink raw reply

* Re: [PATCH] net-next: make sock diag per-namespace (v2)
From: David Miller @ 2012-07-17  5:23 UTC (permalink / raw)
  To: avagin
  Cc: kuznet, jmorris, yoshfuji, kaber, xemul, eric.dumazet,
	linux-kernel, netdev
In-Reply-To: <1342448929-1809316-1-git-send-email-avagin@openvz.org>

From: Andrew Vagin <avagin@openvz.org>
Date: Mon, 16 Jul 2012 18:28:49 +0400

> Before this patch sock_diag works for init_net only and dumps
> information about sockets from all namespaces.
> 
> This patch expands sock_diag for all name-spaces.
> It creates a netlink kernel socket for each netns and filters
> data during dumping.
> 
> v2: filter accoding with netns in all places
>     remove an unused variable.
> 
> Signed-off-by: Andrew Vagin <avagin@openvz.org>

Applied, thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox