All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Li, ZhenHua" <zhen-hual@hp.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Takao Indoh <indou.takao@jp.fujitsu.com>,
	Baoquan He <bhe@redhat.com>,
	"Vaden, Tom L (HP Server OS Architecture)" <tom.vaden@hp.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Joerg Roedel <joro@8bytes.org>,
	"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
	"Hoemann, Jerry" <jerry.hoemann@hp.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"open list:INTEL IOMMU (VT-d)" <iommu@lists.linux-foundation.org>,
	doug.hatch@hp.com,
	"ishii.hironobu@jp.fujitsu.com" <ishii.hironobu@jp.fujitsu.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	zhenhua@hp.com, David Woodhouse <dwmw2@infradead.org>
Subject: Re: [PATCH 0/8] iommu/vt-d: Fix crash dump failure caused by legacy DMA/IO
Date: Wed, 22 Oct 2014 11:08:36 +0800	[thread overview]
Message-ID: <54471FB4.4030602@hp.com> (raw)
In-Reply-To: <87mw8on7lx.fsf@x220.int.ebiederm.org>

Need more time to read and think about these mails. I just want to 
clarify one thing: Bill has left HP, and now I inherited his works.
That's why I sent an update of his patch
	https://lkml.org/lkml/2014/10/21/134

On 10/22/2014 10:47 AM, Eric W. Biederman wrote:
> Bjorn Helgaas <bhelgaas@google.com> writes:
>
>> [-cc Bill, +cc Zhen-Hua, Eric, Tom, Jerry]
>>
>> Hi Joerg,
>>
>> I was looking at Zhen-Hua's recent patches, trying to figure out if I
>> need to do anything with them.  Resetting devices in the old kernel
>> seems like a non-starter.  Resetting devices in the new kernel, ...,
>> well, maybe.  It seems ugly, and it seems like the sort of problem
>> that IOMMUs are designed to solve.  Anyway, I found this old
>> discussion that I didn't quite understand:
>
> For context here is the kexec on panic design, and what I know from
> previous rounds of similar conversations.
>
> The way kexec on panic aka kdump is designed to work is that the
> recovery kernel lives in a piece of memory reserved at boot time and
> known not to be in use by any driver (because we never ever use it for
> DMA).  If DMA's continue from any source the old kernel may be a little
> more corrupted but our currently running kernel should not.
>
> Device drivers that we use in the recovery kernel are required to be
> able to initialize their devices from an arbitrary state or fail to
> initialize their devices.
>
> We have discussed things on various occassions but IOMMUs all have their
> own individual idiosynchrousies and came late to the party so that it
> is hard to generalize.
>
> The reserved region is generally low enough in memory that simply
> not using IOMMUs works.
>
> The major challenge with initializing an IOMMU would be that there are
> potentially devices whose driver is not loaded in the recover kernel
> with on-going DMA sessions (perhaps a NIC in response to network
> packet).
>
> Which essentially means that if you are going to use an IOMMU slot in a
> recovery kernel you have to either know that IOMMU slot was reserved for
> the recovery kernel (what has always felt like the easiest way to me).
> Or you have to know everything that could target that IOMMU slot has
> been reset or has it's driver loaded.
>
> I have always thought the simplist and easiest solution would be to
> reserve a few IOMMU slots for the kexec on panic kernel.  But if folks
> can find other ways to guarantee that an on-going DMA isn't targeting
> an IOMMU slot (such as resetting everything downstream from that
> IOMMU slot) more power to you.
>
>> On Wed, Jul 2, 2014 at 7:32 AM, Joerg Roedel <joro@8bytes.org> wrote:
>>> On Wed, Apr 30, 2014 at 11:49:33AM +0100, David Woodhouse wrote:
>>
>>>> After the last round of this patchset, we discussed a potential
>>>> improvement where you point every virtual bus address at the *same*
>>>> physical scratch page.
>>>
>>> That is a solution to prevent the in-flight DMA failures. But what
>>> happens when there is some in-flight DMA to a disk to write some inodes
>>> or a new superblock. Then this scratch address-space may cause
>>> filesystem corruption at worst.
>>
>> This in-flight DMA is from a device programmed by the old kernel, and
>> it would be reading data from the old kernel's buffers.  I think
>> you're suggesting that we might want that DMA read to complete so the
>> device can update filesystem metadata?
>>
>> I don't really understand that argument.  Don't we usually want to
>> stop any data from escaping the machine after a crash, on the theory
>> that the old kernel is crashing because something is catastrophically
>> wrong and we may have already corrupted things in memory?  If so,
>> allowing this old DMA to complete is just as likely to make things
>> worse as to make them better.
>>
>> Without kdump, we likely would reboot through the BIOS and the device
>> would get reset and the DMA would never happen at all.  So if we made
>> the dump kernel program the IOMMU to prevent the DMA, that seems like
>> a similar situation.
>>
>>> So with this in mind I would prefer initially taking over the
>>> page-tables from the old kernel before the device drivers re-initialize
>>> the devices.
>>
>> This makes the dump kernel more dependent on data from the old kernel,
>> which we obviously want to avoid when possible.
>>
>> I didn't find the previous discussion where pointing every virtual bus
>> address at the same physical scratch page was proposed.  Why was that
>> better than programming the IOMMU to reject every DMA?
>>
>> Bjorn
>
> Eric
>


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: "Li, ZhenHua" <zhen-hual-VXdhtT5mjnY@public.gmane.org>
To: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: Baoquan He <bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"Vaden,
	Tom L (HP Server OS Architecture)"
	<tom.vaden-VXdhtT5mjnY@public.gmane.org>,
	"linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org"
	<kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"open list:INTEL IOMMU (VT-d)"
	<iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	doug.hatch-VXdhtT5mjnY@public.gmane.org,
	"ishii.hironobu-+CUm20s59erQFUHtdCDX3A@public.gmane.org"
	<ishii.hironobu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>,
	Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	zhenhua-VXdhtT5mjnY@public.gmane.org,
	David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Subject: Re: [PATCH 0/8] iommu/vt-d: Fix crash dump failure caused by legacy DMA/IO
Date: Wed, 22 Oct 2014 11:08:36 +0800	[thread overview]
Message-ID: <54471FB4.4030602@hp.com> (raw)
In-Reply-To: <87mw8on7lx.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>

Need more time to read and think about these mails. I just want to 
clarify one thing: Bill has left HP, and now I inherited his works.
That's why I sent an update of his patch
	https://lkml.org/lkml/2014/10/21/134

On 10/22/2014 10:47 AM, Eric W. Biederman wrote:
> Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> writes:
>
>> [-cc Bill, +cc Zhen-Hua, Eric, Tom, Jerry]
>>
>> Hi Joerg,
>>
>> I was looking at Zhen-Hua's recent patches, trying to figure out if I
>> need to do anything with them.  Resetting devices in the old kernel
>> seems like a non-starter.  Resetting devices in the new kernel, ...,
>> well, maybe.  It seems ugly, and it seems like the sort of problem
>> that IOMMUs are designed to solve.  Anyway, I found this old
>> discussion that I didn't quite understand:
>
> For context here is the kexec on panic design, and what I know from
> previous rounds of similar conversations.
>
> The way kexec on panic aka kdump is designed to work is that the
> recovery kernel lives in a piece of memory reserved at boot time and
> known not to be in use by any driver (because we never ever use it for
> DMA).  If DMA's continue from any source the old kernel may be a little
> more corrupted but our currently running kernel should not.
>
> Device drivers that we use in the recovery kernel are required to be
> able to initialize their devices from an arbitrary state or fail to
> initialize their devices.
>
> We have discussed things on various occassions but IOMMUs all have their
> own individual idiosynchrousies and came late to the party so that it
> is hard to generalize.
>
> The reserved region is generally low enough in memory that simply
> not using IOMMUs works.
>
> The major challenge with initializing an IOMMU would be that there are
> potentially devices whose driver is not loaded in the recover kernel
> with on-going DMA sessions (perhaps a NIC in response to network
> packet).
>
> Which essentially means that if you are going to use an IOMMU slot in a
> recovery kernel you have to either know that IOMMU slot was reserved for
> the recovery kernel (what has always felt like the easiest way to me).
> Or you have to know everything that could target that IOMMU slot has
> been reset or has it's driver loaded.
>
> I have always thought the simplist and easiest solution would be to
> reserve a few IOMMU slots for the kexec on panic kernel.  But if folks
> can find other ways to guarantee that an on-going DMA isn't targeting
> an IOMMU slot (such as resetting everything downstream from that
> IOMMU slot) more power to you.
>
>> On Wed, Jul 2, 2014 at 7:32 AM, Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:
>>> On Wed, Apr 30, 2014 at 11:49:33AM +0100, David Woodhouse wrote:
>>
>>>> After the last round of this patchset, we discussed a potential
>>>> improvement where you point every virtual bus address at the *same*
>>>> physical scratch page.
>>>
>>> That is a solution to prevent the in-flight DMA failures. But what
>>> happens when there is some in-flight DMA to a disk to write some inodes
>>> or a new superblock. Then this scratch address-space may cause
>>> filesystem corruption at worst.
>>
>> This in-flight DMA is from a device programmed by the old kernel, and
>> it would be reading data from the old kernel's buffers.  I think
>> you're suggesting that we might want that DMA read to complete so the
>> device can update filesystem metadata?
>>
>> I don't really understand that argument.  Don't we usually want to
>> stop any data from escaping the machine after a crash, on the theory
>> that the old kernel is crashing because something is catastrophically
>> wrong and we may have already corrupted things in memory?  If so,
>> allowing this old DMA to complete is just as likely to make things
>> worse as to make them better.
>>
>> Without kdump, we likely would reboot through the BIOS and the device
>> would get reset and the DMA would never happen at all.  So if we made
>> the dump kernel program the IOMMU to prevent the DMA, that seems like
>> a similar situation.
>>
>>> So with this in mind I would prefer initially taking over the
>>> page-tables from the old kernel before the device drivers re-initialize
>>> the devices.
>>
>> This makes the dump kernel more dependent on data from the old kernel,
>> which we obviously want to avoid when possible.
>>
>> I didn't find the previous discussion where pointing every virtual bus
>> address at the same physical scratch page was proposed.  Why was that
>> better than programming the IOMMU to reject every DMA?
>>
>> Bjorn
>
> Eric
>

WARNING: multiple messages have this Message-ID (diff)
From: "Li, ZhenHua" <zhen-hual@hp.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	Joerg Roedel <joro@8bytes.org>,
	David Woodhouse <dwmw2@infradead.org>,
	"Hoemann, Jerry" <jerry.hoemann@hp.com>,
	Takao Indoh <indou.takao@jp.fujitsu.com>,
	Baoquan He <bhe@redhat.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"open list:INTEL IOMMU (VT-d)" <iommu@lists.linux-foundation.org>,
	doug.hatch@hp.com,
	"ishii.hironobu@jp.fujitsu.com" <ishii.hironobu@jp.fujitsu.com>,
	zhenhua@hp.com, "Vaden,
	Tom L (HP Server OS Architecture)" <tom.vaden@hp.com>
Subject: Re: [PATCH 0/8] iommu/vt-d: Fix crash dump failure caused by legacy DMA/IO
Date: Wed, 22 Oct 2014 11:08:36 +0800	[thread overview]
Message-ID: <54471FB4.4030602@hp.com> (raw)
In-Reply-To: <87mw8on7lx.fsf@x220.int.ebiederm.org>

Need more time to read and think about these mails. I just want to 
clarify one thing: Bill has left HP, and now I inherited his works.
That's why I sent an update of his patch
	https://lkml.org/lkml/2014/10/21/134

On 10/22/2014 10:47 AM, Eric W. Biederman wrote:
> Bjorn Helgaas <bhelgaas@google.com> writes:
>
>> [-cc Bill, +cc Zhen-Hua, Eric, Tom, Jerry]
>>
>> Hi Joerg,
>>
>> I was looking at Zhen-Hua's recent patches, trying to figure out if I
>> need to do anything with them.  Resetting devices in the old kernel
>> seems like a non-starter.  Resetting devices in the new kernel, ...,
>> well, maybe.  It seems ugly, and it seems like the sort of problem
>> that IOMMUs are designed to solve.  Anyway, I found this old
>> discussion that I didn't quite understand:
>
> For context here is the kexec on panic design, and what I know from
> previous rounds of similar conversations.
>
> The way kexec on panic aka kdump is designed to work is that the
> recovery kernel lives in a piece of memory reserved at boot time and
> known not to be in use by any driver (because we never ever use it for
> DMA).  If DMA's continue from any source the old kernel may be a little
> more corrupted but our currently running kernel should not.
>
> Device drivers that we use in the recovery kernel are required to be
> able to initialize their devices from an arbitrary state or fail to
> initialize their devices.
>
> We have discussed things on various occassions but IOMMUs all have their
> own individual idiosynchrousies and came late to the party so that it
> is hard to generalize.
>
> The reserved region is generally low enough in memory that simply
> not using IOMMUs works.
>
> The major challenge with initializing an IOMMU would be that there are
> potentially devices whose driver is not loaded in the recover kernel
> with on-going DMA sessions (perhaps a NIC in response to network
> packet).
>
> Which essentially means that if you are going to use an IOMMU slot in a
> recovery kernel you have to either know that IOMMU slot was reserved for
> the recovery kernel (what has always felt like the easiest way to me).
> Or you have to know everything that could target that IOMMU slot has
> been reset or has it's driver loaded.
>
> I have always thought the simplist and easiest solution would be to
> reserve a few IOMMU slots for the kexec on panic kernel.  But if folks
> can find other ways to guarantee that an on-going DMA isn't targeting
> an IOMMU slot (such as resetting everything downstream from that
> IOMMU slot) more power to you.
>
>> On Wed, Jul 2, 2014 at 7:32 AM, Joerg Roedel <joro@8bytes.org> wrote:
>>> On Wed, Apr 30, 2014 at 11:49:33AM +0100, David Woodhouse wrote:
>>
>>>> After the last round of this patchset, we discussed a potential
>>>> improvement where you point every virtual bus address at the *same*
>>>> physical scratch page.
>>>
>>> That is a solution to prevent the in-flight DMA failures. But what
>>> happens when there is some in-flight DMA to a disk to write some inodes
>>> or a new superblock. Then this scratch address-space may cause
>>> filesystem corruption at worst.
>>
>> This in-flight DMA is from a device programmed by the old kernel, and
>> it would be reading data from the old kernel's buffers.  I think
>> you're suggesting that we might want that DMA read to complete so the
>> device can update filesystem metadata?
>>
>> I don't really understand that argument.  Don't we usually want to
>> stop any data from escaping the machine after a crash, on the theory
>> that the old kernel is crashing because something is catastrophically
>> wrong and we may have already corrupted things in memory?  If so,
>> allowing this old DMA to complete is just as likely to make things
>> worse as to make them better.
>>
>> Without kdump, we likely would reboot through the BIOS and the device
>> would get reset and the DMA would never happen at all.  So if we made
>> the dump kernel program the IOMMU to prevent the DMA, that seems like
>> a similar situation.
>>
>>> So with this in mind I would prefer initially taking over the
>>> page-tables from the old kernel before the device drivers re-initialize
>>> the devices.
>>
>> This makes the dump kernel more dependent on data from the old kernel,
>> which we obviously want to avoid when possible.
>>
>> I didn't find the previous discussion where pointing every virtual bus
>> address at the same physical scratch page was proposed.  Why was that
>> better than programming the IOMMU to reject every DMA?
>>
>> Bjorn
>
> Eric
>


  reply	other threads:[~2014-10-22  3:09 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-25  0:36 [PATCH 0/8] iommu/vt-d: Fix crash dump failure caused by legacy DMA/IO Bill Sumner
2014-04-25  0:36 ` Bill Sumner
2014-04-25  0:36 ` Bill Sumner
2014-04-25  0:36 ` [PATCH 1/8] iommu/vt-d: Fix a few existing lines for checkpatch.pl Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-25  0:36 ` [PATCH 2/8] iommu/vt-d: Consolidate lines for a new private header Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-25  0:36 ` [PATCH 3/8] iommu/vt-d: Create intel-iommu-private.h Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-25  0:36 ` [PATCH 4/8] iommu/vt-d: Update iommu_attach_domain() and its callers Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-25  0:36 ` [PATCH 5/8] iommu/vt-d: Items required for kdump Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-25  0:36 ` [PATCH 6/8] iommu/vt-d: Create intel-iommu-kdump.c Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-25  0:36 ` [PATCH 7/8] iommu/vt-d: Add domain-id functions to intel-iommu-kdump.c Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-25  0:36 ` [PATCH 8/8] iommu/vt-d: Changes to support kdump Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-25  0:36   ` Bill Sumner
2014-04-30 10:49 ` [PATCH 0/8] iommu/vt-d: Fix crash dump failure caused by legacy DMA/IO David Woodhouse
2014-04-30 10:49   ` David Woodhouse
2014-04-30 10:49   ` David Woodhouse
2014-05-02 20:13   ` Jerry Hoemann
2014-05-02 20:13     ` Jerry Hoemann
2014-05-02 20:13     ` Jerry Hoemann
2014-05-07 18:25   ` Jerry Hoemann
2014-05-07 18:25     ` Jerry Hoemann
2014-05-07 18:25     ` Jerry Hoemann
2014-07-02 13:32   ` Joerg Roedel
2014-07-02 13:32     ` Joerg Roedel
2014-07-02 13:32     ` Joerg Roedel
2014-07-11 16:27     ` Jerry Hoemann
2014-07-11 16:27       ` Jerry Hoemann
2014-07-11 16:27       ` Jerry Hoemann
2014-10-15  8:10       ` Li, ZhenHua
2014-10-15  8:10         ` Li, ZhenHua
2014-10-15  8:10         ` Li, ZhenHua
2014-10-15  8:45         ` Li, ZhenHua
2014-10-15  8:45           ` Li, ZhenHua
2014-10-22  2:16     ` Bjorn Helgaas
2014-10-22  2:16       ` Bjorn Helgaas
2014-10-22  2:16       ` Bjorn Helgaas
2014-10-22  2:47       ` Eric W. Biederman
2014-10-22  2:47         ` Eric W. Biederman
2014-10-22  2:47         ` Eric W. Biederman
2014-10-22  3:08         ` Li, ZhenHua [this message]
2014-10-22  3:08           ` Li, ZhenHua
2014-10-22  3:08           ` Li, ZhenHua
2014-10-22 13:21       ` Joerg Roedel
2014-10-22 13:21         ` Joerg Roedel
2014-10-22 18:26         ` Bjorn Helgaas
2014-10-22 18:26           ` Bjorn Helgaas
2014-10-22 18:26           ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54471FB4.4030602@hp.com \
    --to=zhen-hual@hp.com \
    --cc=bhe@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=doug.hatch@hp.com \
    --cc=dwmw2@infradead.org \
    --cc=ebiederm@xmission.com \
    --cc=indou.takao@jp.fujitsu.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=ishii.hironobu@jp.fujitsu.com \
    --cc=jerry.hoemann@hp.com \
    --cc=joro@8bytes.org \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=tom.vaden@hp.com \
    --cc=zhenhua@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.