From: Anders K. Pedersen | Cohaesio <akp@cohaesio.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] Linux 4.12+ memory leak on router with i40e NICs
Date: Sun, 22 Oct 2017 13:56:40 +0000 [thread overview]
Message-ID: <1508680600.4970.26.camel@cohaesio.com> (raw)
In-Reply-To: <CAKgT0Ud1ftJ187qz_XBVnx+X_1BQTgWWe4sA307_3VmpMSSVaw@mail.gmail.com>
On tor, 2017-10-19 at 08:40 -0700, Alexander Duyck wrote:
> On Thu, Oct 19, 2017 at 5:19 AM, Anders K. Pedersen | Cohaesio
> <akp@cohaesio.com> wrote:
> > Hi Alex,
> >
> > On ons, 2017-10-18 at 16:37 -0700, Alexander Duyck wrote:
> > > When we last talked I had asked if you could do a git bisect to
> > > find
> > > the memory leak and you said you would look into it. The most
> > > useful
> > > way to solve this would be to do a git bisect between your
> > > current
> > > kernel and the 4.11 kernel to find the point at which this
> > > started.
> > > If
> > > we can do that then fixing this becomes much simpler as we just
> > > have
> > > to fix the patch that introduced the issue.
> >
> > We're also seeing a smaller memory leak (about 1 GB per day) than
> > the
> > original one even with the "Fix memory leak related filter
> > programming
> > status" fix applied. So far I've determined that the leak is
> > present on
> > 4.13.7 and was introduced between 4.11 and 4.12, so I'll do another
> > round of bisection to identify the patch that introduced this.
> >
> > Since the router must run for a couple of hours before I can be
> > sure
> > whether a kernel is good or bad, and I can't reboot it during
> > working
> > hours, it'll probably be about a week before I have a result.
> >
> > --
> > Venlig hilsen / Best Regards
> >
> > Anders K. Pedersen
> > Senior Technical Manager
>
> Anders,
>
> I'll do some digging on my side to see if I can find any other memory
> leaks that might be floating around in the driver that could have
> been
> introduced during that time-frame.
>
> One thing you might try that would help with your testing would be to
> just disable the ATR functionality in i40e. You can do that with the
> ethtool command "ethtool --set-priv-flags <iface> flow-director-atr
> off". That should allow you to bisect this without needing to deal
> with the "programming status" patches since you won't be programming
> ATR filters which is what caused that leak.
>
> Thanks for looking into this.
>
> - Alex
Hi Alex,
I began bisecting, where I applied the known fix patches to the steps,
where they were applicable (i.e. without changing the flow-director-atr
flag), but some of the steps had a high amount of packet drops, which
caused problems for our network, so I couldn't leave them running for
several hours, which is necessary to determine if the leak is present
or not. The part of the bisection I got through had the same outcome as
the last bisection, which led to "i40e: Fix support for flow
director programming status".
After that I experimented a bit with the flow-director-atr flag, and it
turns out that if I disable this flag on all the NICs, then the memory
leak is gone, so I suspected that the smaller memory leak was also
caused by "i40e: Fix support for flow director programming status".
I tried to revert this patch from 4.13 (with manual fixup for the trace
point that had been added later), but that brought back the packet
drops, so I couldn't let it run.
This morning I saw your "i40e: Add programming descriptors to
cleaned_count" patch, so I tried 4.13.9 with that patch and the
previous "i40e: Fix memory leak related filter programming status"
without turning off the flow-director-atr flag. So far this combination
is running stable without any memory leaks.
Thanks for fixing this.
Regards,
Anders
WARNING: multiple messages have this Message-ID (diff)
From: "Anders K. Pedersen | Cohaesio" <akp@cohaesio.com>
To: "alexander.duyck@gmail.com" <alexander.duyck@gmail.com>
Cc: "pstaszewski@itcare.pl" <pstaszewski@itcare.pl>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"pavlos.parissis@gmail.com" <pavlos.parissis@gmail.com>,
"intel-wired-lan@lists.osuosl.org"
<intel-wired-lan@lists.osuosl.org>,
"alexander.h.duyck@intel.com" <alexander.h.duyck@intel.com>
Subject: Re: Linux 4.12+ memory leak on router with i40e NICs
Date: Sun, 22 Oct 2017 13:56:40 +0000 [thread overview]
Message-ID: <1508680600.4970.26.camel@cohaesio.com> (raw)
In-Reply-To: <CAKgT0Ud1ftJ187qz_XBVnx+X_1BQTgWWe4sA307_3VmpMSSVaw@mail.gmail.com>
On tor, 2017-10-19 at 08:40 -0700, Alexander Duyck wrote:
> On Thu, Oct 19, 2017 at 5:19 AM, Anders K. Pedersen | Cohaesio
> <akp@cohaesio.com> wrote:
> > Hi Alex,
> >
> > On ons, 2017-10-18 at 16:37 -0700, Alexander Duyck wrote:
> > > When we last talked I had asked if you could do a git bisect to
> > > find
> > > the memory leak and you said you would look into it. The most
> > > useful
> > > way to solve this would be to do a git bisect between your
> > > current
> > > kernel and the 4.11 kernel to find the point at which this
> > > started.
> > > If
> > > we can do that then fixing this becomes much simpler as we just
> > > have
> > > to fix the patch that introduced the issue.
> >
> > We're also seeing a smaller memory leak (about 1 GB per day) than
> > the
> > original one even with the "Fix memory leak related filter
> > programming
> > status" fix applied. So far I've determined that the leak is
> > present on
> > 4.13.7 and was introduced between 4.11 and 4.12, so I'll do another
> > round of bisection to identify the patch that introduced this.
> >
> > Since the router must run for a couple of hours before I can be
> > sure
> > whether a kernel is good or bad, and I can't reboot it during
> > working
> > hours, it'll probably be about a week before I have a result.
> >
> > --
> > Venlig hilsen / Best Regards
> >
> > Anders K. Pedersen
> > Senior Technical Manager
>
> Anders,
>
> I'll do some digging on my side to see if I can find any other memory
> leaks that might be floating around in the driver that could have
> been
> introduced during that time-frame.
>
> One thing you might try that would help with your testing would be to
> just disable the ATR functionality in i40e. You can do that with the
> ethtool command "ethtool --set-priv-flags <iface> flow-director-atr
> off". That should allow you to bisect this without needing to deal
> with the "programming status" patches since you won't be programming
> ATR filters which is what caused that leak.
>
> Thanks for looking into this.
>
> - Alex
Hi Alex,
I began bisecting, where I applied the known fix patches to the steps,
where they were applicable (i.e. without changing the flow-director-atr
flag), but some of the steps had a high amount of packet drops, which
caused problems for our network, so I couldn't leave them running for
several hours, which is necessary to determine if the leak is present
or not. The part of the bisection I got through had the same outcome as
the last bisection, which led to "i40e: Fix support for flow
director programming status".
After that I experimented a bit with the flow-director-atr flag, and it
turns out that if I disable this flag on all the NICs, then the memory
leak is gone, so I suspected that the smaller memory leak was also
caused by "i40e: Fix support for flow director programming status".
I tried to revert this patch from 4.13 (with manual fixup for the trace
point that had been added later), but that brought back the packet
drops, so I couldn't let it run.
This morning I saw your "i40e: Add programming descriptors to
cleaned_count" patch, so I tried 4.13.9 with that patch and the
previous "i40e: Fix memory leak related filter programming status"
without turning off the flow-director-atr flag. So far this combination
is running stable without any memory leaks.
Thanks for fixing this.
Regards,
Anders
next prev parent reply other threads:[~2017-10-22 13:56 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-04 12:56 [Intel-wired-lan] Linux 4.12+ memory leak on router with i40e NICs Anders K. Pedersen | Cohaesio
2017-10-04 12:56 ` Anders K. Pedersen | Cohaesio
2017-10-04 15:32 ` [Intel-wired-lan] " Alexander Duyck
2017-10-04 15:32 ` Alexander Duyck
2017-10-05 5:19 ` [Intel-wired-lan] " Anders K. Pedersen | Cohaesio
2017-10-05 5:19 ` Anders K. Pedersen | Cohaesio
2017-10-14 22:00 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-14 22:03 ` =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-15 0:58 ` Alexander Duyck
2017-10-15 0:58 ` Alexander Duyck
2017-10-15 15:03 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-15 15:03 ` Paweł Staszewski
2017-10-16 11:20 ` [Intel-wired-lan] " Pavlos Parissis
2017-10-16 11:20 ` Pavlos Parissis
2017-10-16 14:11 ` [Intel-wired-lan] " Alexander Duyck
2017-10-16 14:11 ` Alexander Duyck
2017-10-16 16:26 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-16 16:26 ` Paweł Staszewski
2017-10-16 23:34 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-16 23:34 ` Paweł Staszewski
2017-10-16 23:56 ` [Intel-wired-lan] " Alexander Duyck
2017-10-16 23:56 ` Alexander Duyck
2017-10-17 0:44 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-17 0:44 ` Paweł Staszewski
2017-10-17 9:48 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-17 9:48 ` Paweł Staszewski
2017-10-17 10:20 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-17 10:20 ` Paweł Staszewski
2017-10-17 10:51 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-17 10:51 ` Paweł Staszewski
2017-10-17 10:59 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-17 10:59 ` Paweł Staszewski
2017-10-17 11:05 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-17 11:05 ` Paweł Staszewski
2017-10-17 11:52 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-17 11:52 ` Paweł Staszewski
2017-10-17 14:08 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-17 14:08 ` Paweł Staszewski
2017-10-18 15:44 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-18 15:44 ` Paweł Staszewski
2017-10-18 22:20 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-18 22:20 ` Paweł Staszewski
2017-10-18 22:50 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-18 22:50 ` Paweł Staszewski
2017-10-18 22:58 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-18 22:58 ` Paweł Staszewski
2017-10-18 23:22 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-18 23:22 ` Paweł Staszewski
2017-10-18 23:37 ` [Intel-wired-lan] " Alexander Duyck
2017-10-18 23:37 ` Alexander Duyck
2017-10-18 23:51 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-18 23:51 ` Paweł Staszewski
2017-10-18 23:56 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-18 23:56 ` Paweł Staszewski
2017-10-18 23:59 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-18 23:59 ` Paweł Staszewski
2017-10-19 17:10 ` [Intel-wired-lan] " Alexander Duyck
2017-10-19 17:10 ` Alexander Duyck
2017-10-19 12:19 ` [Intel-wired-lan] " Anders K. Pedersen | Cohaesio
2017-10-19 12:19 ` Anders K. Pedersen | Cohaesio
2017-10-19 15:40 ` [Intel-wired-lan] " Alexander Duyck
2017-10-19 15:40 ` Alexander Duyck
2017-10-22 13:56 ` Anders K. Pedersen | Cohaesio [this message]
2017-10-22 13:56 ` Anders K. Pedersen | Cohaesio
2017-10-17 5:51 ` [Intel-wired-lan] " Vitezslav Samel
2017-10-17 5:51 ` Vitezslav Samel
2017-10-18 23:29 ` [Intel-wired-lan] " Alexander Duyck
2017-10-18 23:29 ` Alexander Duyck
2017-10-18 23:40 ` [Intel-wired-lan] " =?unknown-8bit?q?Pawe=C5=82?= Staszewski
2017-10-18 23:40 ` Paweł Staszewski
2017-10-19 11:41 ` [Intel-wired-lan] " Pavlos Parissis
2017-10-19 11:41 ` Pavlos Parissis
2017-10-19 15:53 ` [Intel-wired-lan] " Alexander Duyck
2017-10-19 15:53 ` Alexander Duyck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1508680600.4970.26.camel@cohaesio.com \
--to=akp@cohaesio.com \
--cc=intel-wired-lan@osuosl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.