Discussion:
[PATCH v3 1/1] drm: msm: Replace dma_map_sg with dma_sync_sg*
Christoph Hellwig
2018-11-29 14:14:29 UTC
Permalink
dma_map_sg() expects a DMA domain. However, the drm devices
have been traditionally using unmanaged iommu domain which
is non-dma type. Using dma mapping APIs with that domain is bad.
Replace dma_map_sg() calls with dma_sync_sg_for_device{|cpu}()
to do the cache maintenance.
As I told you before: hell no. If you spent the slightest amount of
actually trying to understand what you are doing here you'd know this
can't work. Just turn on dma debugging and this will blow up in your
face.

Either you use the DMA API properly, that is you use it to map and
to sync, or you don't use it at all. Mix and match between iommu
APIs and DMA APIs is simply not possible.
Rob Clark
2018-11-29 14:25:43 UTC
Permalink
Post by Christoph Hellwig
dma_map_sg() expects a DMA domain. However, the drm devices
have been traditionally using unmanaged iommu domain which
is non-dma type. Using dma mapping APIs with that domain is bad.
Replace dma_map_sg() calls with dma_sync_sg_for_device{|cpu}()
to do the cache maintenance.
As I told you before: hell no. If you spent the slightest amount of
actually trying to understand what you are doing here you'd know this
can't work. Just turn on dma debugging and this will blow up in your
face.
you can tone it down.. we weren't the ones who created the dma/iommu
mess, we are just trying to find a way to work around it
Post by Christoph Hellwig
Either you use the DMA API properly, that is you use it to map and
to sync, or you don't use it at all. Mix and match between iommu
APIs and DMA APIs is simply not possible.
I'd *love* nothing more to not use the dma api.. but on arm there is
no other way to do cache maint.

BR,
-R
Rob Clark
2018-11-29 14:42:50 UTC
Permalink
Post by Rob Clark
Post by Christoph Hellwig
dma_map_sg() expects a DMA domain. However, the drm devices
have been traditionally using unmanaged iommu domain which
is non-dma type. Using dma mapping APIs with that domain is bad.
Replace dma_map_sg() calls with dma_sync_sg_for_device{|cpu}()
to do the cache maintenance.
As I told you before: hell no. If you spent the slightest amount of
actually trying to understand what you are doing here you'd know this
can't work. Just turn on dma debugging and this will blow up in your
face.
you can tone it down.. we weren't the ones who created the dma/iommu
mess, we are just trying to find a way to work around it
Post by Christoph Hellwig
Either you use the DMA API properly, that is you use it to map and
to sync, or you don't use it at all. Mix and match between iommu
APIs and DMA APIs is simply not possible.
I'd *love* nothing more to not use the dma api.. but on arm there is
no other way to do cache maint.
btw, one of us will try this w/ dma debugging enabled..

Maybe the thing we need to do is just implement a blacklist of
compatible strings for devices which should skip the automatic
iommu/dma hookup. Maybe a bit ugly, but it would also solve a problem
preventing us from enabling per-process pagetables for a5xx (where we
need to control the domain/context-bank that is allocated by the dma
api).

Like I've said before, I'm open to other suggestions. But the
automagic behind-the-scenes dma+iommu hookup really screwed us, and we
need to find some sort of solution instead of downstream hacks.

BR,
-R
Christoph Hellwig
2018-11-29 15:54:18 UTC
Permalink
Post by Rob Clark
Maybe the thing we need to do is just implement a blacklist of
compatible strings for devices which should skip the automatic
iommu/dma hookup. Maybe a bit ugly, but it would also solve a problem
preventing us from enabling per-process pagetables for a5xx (where we
need to control the domain/context-bank that is allocated by the dma
api).
You can detach from the dma map attachment using arm_iommu_detach_device,
which a few drm drivers do, but I don't think this is the problem.
Rob Clark
2018-11-29 18:48:15 UTC
Permalink
Post by Christoph Hellwig
Post by Rob Clark
Maybe the thing we need to do is just implement a blacklist of
compatible strings for devices which should skip the automatic
iommu/dma hookup. Maybe a bit ugly, but it would also solve a problem
preventing us from enabling per-process pagetables for a5xx (where we
need to control the domain/context-bank that is allocated by the dma
api).
You can detach from the dma map attachment using arm_iommu_detach_device,
which a few drm drivers do, but I don't think this is the problem.
I think even with detach, we wouldn't end up with the context-bank
that the gpu firmware was hard-coded to expect, and so it would
overwrite the incorrect page table address register. (I could be
mis-remembering that, Jordan spent more time looking at that. But it
was something along those lines.)

BR,
-R
Jordan Crouse
2018-11-29 19:40:29 UTC
Permalink
Post by Rob Clark
Post by Christoph Hellwig
Post by Rob Clark
Maybe the thing we need to do is just implement a blacklist of
compatible strings for devices which should skip the automatic
iommu/dma hookup. Maybe a bit ugly, but it would also solve a problem
preventing us from enabling per-process pagetables for a5xx (where we
need to control the domain/context-bank that is allocated by the dma
api).
You can detach from the dma map attachment using arm_iommu_detach_device,
which a few drm drivers do, but I don't think this is the problem.
I think even with detach, we wouldn't end up with the context-bank
that the gpu firmware was hard-coded to expect, and so it would
overwrite the incorrect page table address register. (I could be
mis-remembering that, Jordan spent more time looking at that. But it
was something along those lines.)
Right - basically the DMA domain steals context bank 0 and the GPU is hard coded
to use that context bank for pagetable switching.

I believe the Tegra guys also had a similar problem with a hard coded context
bank.

This is a discussion we do need to have but not at the risk of derailing the
caching discussion which is arguably more important and has much wider ranging
implications for multimedia and Ion and such.

Jordan
--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
Tomasz Figa
2018-11-29 19:57:16 UTC
Permalink
Post by Jordan Crouse
Post by Rob Clark
Post by Christoph Hellwig
Post by Rob Clark
Maybe the thing we need to do is just implement a blacklist of
compatible strings for devices which should skip the automatic
iommu/dma hookup. Maybe a bit ugly, but it would also solve a problem
preventing us from enabling per-process pagetables for a5xx (where we
need to control the domain/context-bank that is allocated by the dma
api).
You can detach from the dma map attachment using arm_iommu_detach_device,
which a few drm drivers do, but I don't think this is the problem.
I think even with detach, we wouldn't end up with the context-bank
that the gpu firmware was hard-coded to expect, and so it would
overwrite the incorrect page table address register. (I could be
mis-remembering that, Jordan spent more time looking at that. But it
was something along those lines.)
Right - basically the DMA domain steals context bank 0 and the GPU is hard coded
to use that context bank for pagetable switching.
I believe the Tegra guys also had a similar problem with a hard coded context
bank.
Wait, if we detach the GPU/display struct device from the default
domain and attach it to a newly allocated domain, wouldn't the newly
allocated domain use the context bank we need? Note that we're already
doing that, except that we're doing it behind the back of the DMA
mapping subsystem, so that it keeps using the IOMMU version of the DMA
ops for the device and doing any mapping operations on the default
domain. If we ask the DMA mapping to detach, wouldn't it essentially
solve the problem?

Best regards,
Tomasz
Robin Murphy
2018-11-29 20:03:51 UTC
Permalink
Post by Tomasz Figa
Post by Jordan Crouse
Post by Rob Clark
Post by Christoph Hellwig
Post by Rob Clark
Maybe the thing we need to do is just implement a blacklist of
compatible strings for devices which should skip the automatic
iommu/dma hookup. Maybe a bit ugly, but it would also solve a problem
preventing us from enabling per-process pagetables for a5xx (where we
need to control the domain/context-bank that is allocated by the dma
api).
You can detach from the dma map attachment using arm_iommu_detach_device,
which a few drm drivers do, but I don't think this is the problem.
I think even with detach, we wouldn't end up with the context-bank
that the gpu firmware was hard-coded to expect, and so it would
overwrite the incorrect page table address register. (I could be
mis-remembering that, Jordan spent more time looking at that. But it
was something along those lines.)
Right - basically the DMA domain steals context bank 0 and the GPU is hard coded
to use that context bank for pagetable switching.
I believe the Tegra guys also had a similar problem with a hard coded context
bank.
AIUI, they don't need a specific hardware context, they just need to
know which one they're actually using, which the domain abstraction hides.
Post by Tomasz Figa
Wait, if we detach the GPU/display struct device from the default
domain and attach it to a newly allocated domain, wouldn't the newly
allocated domain use the context bank we need? Note that we're already
The arm-smmu driver doesn't, but there's no fundamental reason it
couldn't. That should just need code to refcount domain users and
release hardware contexts for domains with no devices currently attached.

Robin.
Post by Tomasz Figa
doing that, except that we're doing it behind the back of the DMA
mapping subsystem, so that it keeps using the IOMMU version of the DMA
ops for the device and doing any mapping operations on the default
domain. If we ask the DMA mapping to detach, wouldn't it essentially
solve the problem?
Best regards,
Tomasz
Tomasz Figa
2018-11-30 00:23:59 UTC
Permalink
Post by Robin Murphy
Post by Tomasz Figa
Post by Jordan Crouse
Post by Rob Clark
Post by Christoph Hellwig
Post by Rob Clark
Maybe the thing we need to do is just implement a blacklist of
compatible strings for devices which should skip the automatic
iommu/dma hookup. Maybe a bit ugly, but it would also solve a problem
preventing us from enabling per-process pagetables for a5xx (where we
need to control the domain/context-bank that is allocated by the dma
api).
You can detach from the dma map attachment using arm_iommu_detach_device,
which a few drm drivers do, but I don't think this is the problem.
I think even with detach, we wouldn't end up with the context-bank
that the gpu firmware was hard-coded to expect, and so it would
overwrite the incorrect page table address register. (I could be
mis-remembering that, Jordan spent more time looking at that. But it
was something along those lines.)
Right - basically the DMA domain steals context bank 0 and the GPU is hard coded
to use that context bank for pagetable switching.
I believe the Tegra guys also had a similar problem with a hard coded context
bank.
AIUI, they don't need a specific hardware context, they just need to
know which one they're actually using, which the domain abstraction hides.
Post by Tomasz Figa
Wait, if we detach the GPU/display struct device from the default
domain and attach it to a newly allocated domain, wouldn't the newly
allocated domain use the context bank we need? Note that we're already
The arm-smmu driver doesn't, but there's no fundamental reason it
couldn't. That should just need code to refcount domain users and
release hardware contexts for domains with no devices currently attached.
Robin.
Post by Tomasz Figa
doing that, except that we're doing it behind the back of the DMA
mapping subsystem, so that it keeps using the IOMMU version of the DMA
ops for the device and doing any mapping operations on the default
domain. If we ask the DMA mapping to detach, wouldn't it essentially
solve the problem?
Thanks Robin.

Still, my point is that the MSM DRM driver attaches the GPU struct
device to a new domain it allocates using iommu_domain_alloc() and it
seems to work fine, so I believe it's not the problem we're looking
into with this patch.

Best regards,
Tomasz
Tomasz Figa
2018-12-01 02:05:12 UTC
Permalink
Post by Tomasz Figa
Post by Robin Murphy
Post by Tomasz Figa
Post by Jordan Crouse
Post by Rob Clark
Post by Christoph Hellwig
Post by Rob Clark
Maybe the thing we need to do is just implement a blacklist of
compatible strings for devices which should skip the automatic
iommu/dma hookup. Maybe a bit ugly, but it would also solve a problem
preventing us from enabling per-process pagetables for a5xx (where we
need to control the domain/context-bank that is allocated by the dma
api).
You can detach from the dma map attachment using arm_iommu_detach_device,
which a few drm drivers do, but I don't think this is the problem.
I think even with detach, we wouldn't end up with the context-bank
that the gpu firmware was hard-coded to expect, and so it would
overwrite the incorrect page table address register. (I could be
mis-remembering that, Jordan spent more time looking at that. But it
was something along those lines.)
Right - basically the DMA domain steals context bank 0 and the GPU is hard coded
to use that context bank for pagetable switching.
I believe the Tegra guys also had a similar problem with a hard coded context
bank.
AIUI, they don't need a specific hardware context, they just need to
know which one they're actually using, which the domain abstraction hides.
Post by Tomasz Figa
Wait, if we detach the GPU/display struct device from the default
domain and attach it to a newly allocated domain, wouldn't the newly
allocated domain use the context bank we need? Note that we're already
The arm-smmu driver doesn't, but there's no fundamental reason it
couldn't. That should just need code to refcount domain users and
release hardware contexts for domains with no devices currently attached.
Robin.
Post by Tomasz Figa
doing that, except that we're doing it behind the back of the DMA
mapping subsystem, so that it keeps using the IOMMU version of the DMA
ops for the device and doing any mapping operations on the default
domain. If we ask the DMA mapping to detach, wouldn't it essentially
solve the problem?
Thanks Robin.
Still, my point is that the MSM DRM driver attaches the GPU struct
device to a new domain it allocates using iommu_domain_alloc() and it
seems to work fine, so I believe it's not the problem we're looking
into with this patch.
Could we just make the MSM DRM call arch_teardown_dma_ops() and then
arch_setup_dma_ops() with the `iommu` argument set to NULL and be done
with it?

Best regards,
Tomasz
Rob Clark
2018-12-01 11:46:44 UTC
Permalink
Post by Tomasz Figa
Post by Tomasz Figa
Post by Robin Murphy
Post by Tomasz Figa
Post by Jordan Crouse
Post by Rob Clark
Post by Christoph Hellwig
Post by Rob Clark
Maybe the thing we need to do is just implement a blacklist of
compatible strings for devices which should skip the automatic
iommu/dma hookup. Maybe a bit ugly, but it would also solve a problem
preventing us from enabling per-process pagetables for a5xx (where we
need to control the domain/context-bank that is allocated by the dma
api).
You can detach from the dma map attachment using arm_iommu_detach_device,
which a few drm drivers do, but I don't think this is the problem.
I think even with detach, we wouldn't end up with the context-bank
that the gpu firmware was hard-coded to expect, and so it would
overwrite the incorrect page table address register. (I could be
mis-remembering that, Jordan spent more time looking at that. But it
was something along those lines.)
Right - basically the DMA domain steals context bank 0 and the GPU is hard coded
to use that context bank for pagetable switching.
I believe the Tegra guys also had a similar problem with a hard coded context
bank.
AIUI, they don't need a specific hardware context, they just need to
know which one they're actually using, which the domain abstraction hides.
Post by Tomasz Figa
Wait, if we detach the GPU/display struct device from the default
domain and attach it to a newly allocated domain, wouldn't the newly
allocated domain use the context bank we need? Note that we're already
The arm-smmu driver doesn't, but there's no fundamental reason it
couldn't. That should just need code to refcount domain users and
release hardware contexts for domains with no devices currently attached.
Robin.
Post by Tomasz Figa
doing that, except that we're doing it behind the back of the DMA
mapping subsystem, so that it keeps using the IOMMU version of the DMA
ops for the device and doing any mapping operations on the default
domain. If we ask the DMA mapping to detach, wouldn't it essentially
solve the problem?
Thanks Robin.
Still, my point is that the MSM DRM driver attaches the GPU struct
device to a new domain it allocates using iommu_domain_alloc() and it
seems to work fine, so I believe it's not the problem we're looking
into with this patch.
Could we just make the MSM DRM call arch_teardown_dma_ops() and then
arch_setup_dma_ops() with the `iommu` argument set to NULL and be done
with it?
I don't think those are exported to modules?

I have actually a simpler patch, that adds a small blacklist to check
in of_dma_configure() before calling arch_setup_dma_ops(), which can
replace this patch. It also solves the problem of dma api allocating
the context bank that he gpu wants to use for context-switching, and
should be a simple thing to backport to stable branches.

I was just spending some time trying to figure out what changed
recently to start causing dma_map_sg() to opps on boot for us, so I
could write a more detailed commit msg.

BR,
-R
Tomasz Figa
2018-12-03 00:12:19 UTC
Permalink
Post by Rob Clark
Post by Tomasz Figa
Post by Tomasz Figa
Post by Robin Murphy
Post by Tomasz Figa
Post by Jordan Crouse
Post by Rob Clark
Post by Christoph Hellwig
Post by Rob Clark
Maybe the thing we need to do is just implement a blacklist of
compatible strings for devices which should skip the automatic
iommu/dma hookup. Maybe a bit ugly, but it would also solve a problem
preventing us from enabling per-process pagetables for a5xx (where we
need to control the domain/context-bank that is allocated by the dma
api).
You can detach from the dma map attachment using arm_iommu_detach_device,
which a few drm drivers do, but I don't think this is the problem.
I think even with detach, we wouldn't end up with the context-bank
that the gpu firmware was hard-coded to expect, and so it would
overwrite the incorrect page table address register. (I could be
mis-remembering that, Jordan spent more time looking at that. But it
was something along those lines.)
Right - basically the DMA domain steals context bank 0 and the GPU is hard coded
to use that context bank for pagetable switching.
I believe the Tegra guys also had a similar problem with a hard coded context
bank.
AIUI, they don't need a specific hardware context, they just need to
know which one they're actually using, which the domain abstraction hides.
Post by Tomasz Figa
Wait, if we detach the GPU/display struct device from the default
domain and attach it to a newly allocated domain, wouldn't the newly
allocated domain use the context bank we need? Note that we're already
The arm-smmu driver doesn't, but there's no fundamental reason it
couldn't. That should just need code to refcount domain users and
release hardware contexts for domains with no devices currently attached.
Robin.
Post by Tomasz Figa
doing that, except that we're doing it behind the back of the DMA
mapping subsystem, so that it keeps using the IOMMU version of the DMA
ops for the device and doing any mapping operations on the default
domain. If we ask the DMA mapping to detach, wouldn't it essentially
solve the problem?
Thanks Robin.
Still, my point is that the MSM DRM driver attaches the GPU struct
device to a new domain it allocates using iommu_domain_alloc() and it
seems to work fine, so I believe it's not the problem we're looking
into with this patch.
Could we just make the MSM DRM call arch_teardown_dma_ops() and then
arch_setup_dma_ops() with the `iommu` argument set to NULL and be done
with it?
I don't think those are exported to modules?
Indeed, if we compile MSM DRM as modules, it wouldn't work...
Post by Rob Clark
I have actually a simpler patch, that adds a small blacklist to check
in of_dma_configure() before calling arch_setup_dma_ops(), which can
replace this patch. It also solves the problem of dma api allocating
the context bank that he gpu wants to use for context-switching, and
should be a simple thing to backport to stable branches.
I was just spending some time trying to figure out what changed
recently to start causing dma_map_sg() to opps on boot for us, so I
could write a more detailed commit msg.
Yeah, that sounds much better, thanks. Reviewed that patch.

Best regards,
Tomasz
Daniel Vetter
2018-11-29 14:43:50 UTC
Permalink
Post by Rob Clark
Post by Christoph Hellwig
dma_map_sg() expects a DMA domain. However, the drm devices
have been traditionally using unmanaged iommu domain which
is non-dma type. Using dma mapping APIs with that domain is bad.
Replace dma_map_sg() calls with dma_sync_sg_for_device{|cpu}()
to do the cache maintenance.
As I told you before: hell no. If you spent the slightest amount of
actually trying to understand what you are doing here you'd know this
can't work. Just turn on dma debugging and this will blow up in your
face.
you can tone it down.. we weren't the ones who created the dma/iommu
mess, we are just trying to find a way to work around it
Post by Christoph Hellwig
Either you use the DMA API properly, that is you use it to map and
to sync, or you don't use it at all. Mix and match between iommu
APIs and DMA APIs is simply not possible.
I'd *love* nothing more to not use the dma api.. but on arm there is
no other way to do cache maint.
Yeah we had patches to add manual cache management code to drm, so we
don't have to abuse the dma streaming api anymore. Got shouted down.
Abusing the dma streaming api also gets shouted down. It's a gpu, any
idea of these drivers actually being platform independent is out of
the window from the start anyway, so we're ok with tying this to
platforms.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Christoph Hellwig
2018-11-29 15:57:58 UTC
Permalink
Post by Daniel Vetter
Yeah we had patches to add manual cache management code to drm, so we
don't have to abuse the dma streaming api anymore. Got shouted down.
Abusing the dma streaming api also gets shouted down. It's a gpu, any
idea of these drivers actually being platform independent is out of
the window from the start anyway, so we're ok with tying this to
platforms.
Manual or not the iommu API is missing APIs for cache management,
which makes it kinda surprising it actually ever worked for non-coherent
devices.

And fortunately while some people spent the last year ot two bickering
about the situation others actually did work, and we now have a
generic arch_sync_dma_for_device/arch_sync_dma_for_cpu kernel-internal
API. This is only used for DMA API internals so far, and explicitly
not intended for direct driver use, but it would be perfect as the
backend for iommu API cache maintainance functions. It exists on all
but two architectures on mainline. Out of those powerpc is in the works,
only arm32 will need some major help.
Daniel Vetter
2018-11-29 16:28:07 UTC
Permalink
Post by Christoph Hellwig
Post by Daniel Vetter
Yeah we had patches to add manual cache management code to drm, so we
don't have to abuse the dma streaming api anymore. Got shouted down.
Abusing the dma streaming api also gets shouted down. It's a gpu, any
idea of these drivers actually being platform independent is out of
the window from the start anyway, so we're ok with tying this to
platforms.
Manual or not the iommu API is missing APIs for cache management,
which makes it kinda surprising it actually ever worked for non-coherent
devices.
And fortunately while some people spent the last year ot two bickering
about the situation others actually did work, and we now have a
generic arch_sync_dma_for_device/arch_sync_dma_for_cpu kernel-internal
API. This is only used for DMA API internals so far, and explicitly
not intended for direct driver use, but it would be perfect as the
backend for iommu API cache maintainance functions. It exists on all
but two architectures on mainline. Out of those powerpc is in the works,
only arm32 will need some major help.
Oh, this sounds neat. At least some massive progress.

Just spend a bit of time reading through the implementations already
merged. Is the struct device *dev parameter actually needed anywhere?
dma-api definitely needs it, because we need that to pick the right iommu.
But for cache management from what I've seen the target device doesn't
matter, all the target specific stuff will be handled by the iommu.

Dropping the dev parameter would make this a perfect fit for coherency
management of buffers used by multiple devices. Right now there's all
kinds of nasty tricks for that use cases needed to avoid redundant
flushes.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Christoph Hellwig
2018-11-29 16:57:15 UTC
Permalink
Post by Daniel Vetter
Just spend a bit of time reading through the implementations already
merged. Is the struct device *dev parameter actually needed anywhere?
dma-api definitely needs it, because we need that to pick the right iommu.
But for cache management from what I've seen the target device doesn't
matter, all the target specific stuff will be handled by the iommu.
It looks like only mips every uses the dev argument, and even there
the function it passes dev to ignores it. So it could probably be removed.
Post by Daniel Vetter
Dropping the dev parameter would make this a perfect fit for coherency
management of buffers used by multiple devices. Right now there's all
kinds of nasty tricks for that use cases needed to avoid redundant
flushes.
Note that one thing I'd like to avoid is exposing these funtions directly
to drivers, as that will get us into all kinds of abuses.

So I'd much prefer if we could have iommu APIs wrapping these that are
specific to actual uses cases that we understand well.

As for the buffer sharing: at least for the DMA API side I want to
move the current buffer sharing users away from dma_alloc_coherent
(and coherent dma_alloc_attrs users) and the remapping done in there
required for non-coherent architectures. Instead I'd like to allocate
plain old pages, and then just dma map them for each device separately,
with DMA_ATTR_SKIP_CPU_SYNC passed for all but the first user to map
or last user to unmap. On the iommu side it could probably work
similar.

I have done some preliminary work on this, and want to get it into this
merge window, but there is a few other bits I need to sort out first.
Post by Daniel Vetter
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
---end quoted text---
Daniel Vetter
2018-11-29 17:09:05 UTC
Permalink
Post by Christoph Hellwig
Post by Daniel Vetter
Just spend a bit of time reading through the implementations already
merged. Is the struct device *dev parameter actually needed anywhere?
dma-api definitely needs it, because we need that to pick the right iommu.
But for cache management from what I've seen the target device doesn't
matter, all the target specific stuff will be handled by the iommu.
It looks like only mips every uses the dev argument, and even there
the function it passes dev to ignores it. So it could probably be removed.
Post by Daniel Vetter
Dropping the dev parameter would make this a perfect fit for coherency
management of buffers used by multiple devices. Right now there's all
kinds of nasty tricks for that use cases needed to avoid redundant
flushes.
Note that one thing I'd like to avoid is exposing these funtions directly
to drivers, as that will get us into all kinds of abuses.
What kind of abuse do you expect? It could very well be that gpu folks
call that "standard use case" ... At least on x86 with the i915 driver
we pretty much rely on architectural guarantees for how cache flushes
work very much. Down to userspace doing the cache flushing for
mappings the kernel has set up.
Post by Christoph Hellwig
So I'd much prefer if we could have iommu APIs wrapping these that are
specific to actual uses cases that we understand well.
As for the buffer sharing: at least for the DMA API side I want to
move the current buffer sharing users away from dma_alloc_coherent
(and coherent dma_alloc_attrs users) and the remapping done in there
required for non-coherent architectures. Instead I'd like to allocate
plain old pages, and then just dma map them for each device separately,
with DMA_ATTR_SKIP_CPU_SYNC passed for all but the first user to map
or last user to unmap. On the iommu side it could probably work
similar.
I think this is what's often done. Except then there's also the issue
of how to get at the cma allocator if your device needs something
contiguous. There's a lot of that still going on in graphics/media.
-Daniel
Post by Christoph Hellwig
I have done some preliminary work on this, and want to get it into this
merge window, but there is a few other bits I need to sort out first.
Post by Daniel Vetter
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
---end quoted text---
_______________________________________________
dri-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Tomasz Figa
2018-11-29 17:24:17 UTC
Permalink
[CC Marek]
Post by Daniel Vetter
Post by Christoph Hellwig
Post by Daniel Vetter
Just spend a bit of time reading through the implementations already
merged. Is the struct device *dev parameter actually needed anywhere?
dma-api definitely needs it, because we need that to pick the right iommu.
But for cache management from what I've seen the target device doesn't
matter, all the target specific stuff will be handled by the iommu.
It looks like only mips every uses the dev argument, and even there
the function it passes dev to ignores it. So it could probably be removed.
Whether the cache maintenance operation needs to actually do anything
or not is a function of `dev`. We can have some devices that are
coherent with CPU caches, and some that are not, on the same system.
Post by Daniel Vetter
Post by Christoph Hellwig
Post by Daniel Vetter
Dropping the dev parameter would make this a perfect fit for coherency
management of buffers used by multiple devices. Right now there's all
kinds of nasty tricks for that use cases needed to avoid redundant
flushes.
Note that one thing I'd like to avoid is exposing these funtions directly
to drivers, as that will get us into all kinds of abuses.
What kind of abuse do you expect? It could very well be that gpu folks
call that "standard use case" ... At least on x86 with the i915 driver
we pretty much rely on architectural guarantees for how cache flushes
work very much. Down to userspace doing the cache flushing for
mappings the kernel has set up.
i915 is a very specific case of a fully contained,
architecture-specific hardware subsystem, where you can just hardcode
all integration details inside the driver, because nobody else would
care.

In ARM world, you can have the same IP blocks licensed by multiple SoC
vendors with different integration details and that often includes the
option of coherency.
Post by Daniel Vetter
Post by Christoph Hellwig
So I'd much prefer if we could have iommu APIs wrapping these that are
specific to actual uses cases that we understand well.
As for the buffer sharing: at least for the DMA API side I want to
move the current buffer sharing users away from dma_alloc_coherent
(and coherent dma_alloc_attrs users) and the remapping done in there
required for non-coherent architectures. Instead I'd like to allocate
plain old pages, and then just dma map them for each device separately,
with DMA_ATTR_SKIP_CPU_SYNC passed for all but the first user to map
or last user to unmap. On the iommu side it could probably work
similar.
I think this is what's often done. Except then there's also the issue
of how to get at the cma allocator if your device needs something
contiguous. There's a lot of that still going on in graphics/media.
I suppose one could just expose CMA with the default pool directly.
It's just an allocator, so not sure why it would need any
device-specific information.

There is also the use case of using CMA with device-specific pools of
memory reusable by the system when not used by the device and those
would have to somehow get the pool to allocate from, but I wonder if
struct device is the right way to pass such information. I'd see the
pool given explicitly like cma_alloc(struct cma_pool *, size, flags)
and perhaps a wrapper cma_alloc_default(size, flags) that is just a
simple macro calling cma_alloc(&cma_pool_default, size, flags).

Best regards,
Tomasz
Christoph Hellwig
2018-11-29 18:35:06 UTC
Permalink
Post by Tomasz Figa
Whether the cache maintenance operation needs to actually do anything
or not is a function of `dev`. We can have some devices that are
coherent with CPU caches, and some that are not, on the same system.
Yes, but that part is not decided by these low-level helpers but their
callers in the DMA code (or maybe IOMMU code as well in the future).
Post by Tomasz Figa
There is also the use case of using CMA with device-specific pools of
memory reusable by the system when not used by the device and those
would have to somehow get the pool to allocate from, but I wonder if
struct device is the right way to pass such information. I'd see the
pool given explicitly like cma_alloc(struct cma_pool *, size, flags)
and perhaps a wrapper cma_alloc_default(size, flags) that is just a
simple macro calling cma_alloc(&cma_pool_default, size, flags).
Yes, the cma APIs have quite a few warts that need addressing.
I have a few of those things on my todo list, but help is always
welcome.
Rob Clark
2018-11-29 18:57:38 UTC
Permalink
Post by Tomasz Figa
[CC Marek]
Post by Daniel Vetter
Post by Christoph Hellwig
Note that one thing I'd like to avoid is exposing these funtions directly
to drivers, as that will get us into all kinds of abuses.
What kind of abuse do you expect? It could very well be that gpu folks
call that "standard use case" ... At least on x86 with the i915 driver
we pretty much rely on architectural guarantees for how cache flushes
work very much. Down to userspace doing the cache flushing for
mappings the kernel has set up.
i915 is a very specific case of a fully contained,
architecture-specific hardware subsystem, where you can just hardcode
all integration details inside the driver, because nobody else would
care.
In ARM world, you can have the same IP blocks licensed by multiple SoC
vendors with different integration details and that often includes the
option of coherency.
fwiw, I believe all the GPU IP blocks that are used across multiple
SoCs have their own GPU MMU (potentially in addition to an iommu?).
So the dma-api is a much better fit for them.. drm/msm is a lot
closer to drm/i915 scenario, so I don't so much care if the solution
to our unique problem isn't something that would work for other
drivers ;-)

BR,
-R
Daniel Vetter
2018-11-30 09:40:02 UTC
Permalink
Post by Rob Clark
Post by Tomasz Figa
[CC Marek]
Post by Daniel Vetter
Post by Christoph Hellwig
Note that one thing I'd like to avoid is exposing these funtions directly
to drivers, as that will get us into all kinds of abuses.
What kind of abuse do you expect? It could very well be that gpu folks
call that "standard use case" ... At least on x86 with the i915 driver
we pretty much rely on architectural guarantees for how cache flushes
work very much. Down to userspace doing the cache flushing for
mappings the kernel has set up.
i915 is a very specific case of a fully contained,
architecture-specific hardware subsystem, where you can just hardcode
all integration details inside the driver, because nobody else would
care.
In ARM world, you can have the same IP blocks licensed by multiple SoC
vendors with different integration details and that often includes the
option of coherency.
fwiw, I believe all the GPU IP blocks that are used across multiple
SoCs have their own GPU MMU (potentially in addition to an iommu?).
So the dma-api is a much better fit for them.. drm/msm is a lot
closer to drm/i915 scenario, so I don't so much care if the solution
to our unique problem isn't something that would work for other
drivers ;-)
Right now maybe, but I fully except the entire coherent vs. non-coherent
transactions hilarity that we have on the bigger intel socs since a few
years already to trickle down into smaller (arm based socs) eventually. I
think Apple is already there since a few generations.

So maybe we won't have to fight the iommu side of the dma-api anymore
on these, but we'll still have to fight the cache maintenance side of
dma-api. You can tell the dma-api to not flush, but then you don't have
any other way to actually flush that's acceptable for arch/arm (on x86 we
just run clflush in userspace and call it a day).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Daniel Vetter
2018-11-30 09:35:27 UTC
Permalink
Post by Tomasz Figa
[CC Marek]
Post by Daniel Vetter
Post by Christoph Hellwig
Post by Daniel Vetter
Just spend a bit of time reading through the implementations already
merged. Is the struct device *dev parameter actually needed anywhere?
dma-api definitely needs it, because we need that to pick the right iommu.
But for cache management from what I've seen the target device doesn't
matter, all the target specific stuff will be handled by the iommu.
It looks like only mips every uses the dev argument, and even there
the function it passes dev to ignores it. So it could probably be removed.
Whether the cache maintenance operation needs to actually do anything
or not is a function of `dev`. We can have some devices that are
coherent with CPU caches, and some that are not, on the same system.
So the thing is that the gpu driver knows this too. It fairly often can
even decide at runtime (through surface state bits or gpu pte bits)
whether to use coherent or non-coherent transactions. dma-api assuming
that a given device is always coherent or non-coherent is one of the
fundamental mismatches we have.

If you otoh need dev because there's some strange bridge caches you need
to flush (I've never seen that, but who knows), that would be a diffeernt
thing. All the bridge flushing I've seen is attached to the iommu though,
so would be really a surprise if the cache management needs that too.
Post by Tomasz Figa
Post by Daniel Vetter
Post by Christoph Hellwig
Post by Daniel Vetter
Dropping the dev parameter would make this a perfect fit for coherency
management of buffers used by multiple devices. Right now there's all
kinds of nasty tricks for that use cases needed to avoid redundant
flushes.
Note that one thing I'd like to avoid is exposing these funtions directly
to drivers, as that will get us into all kinds of abuses.
What kind of abuse do you expect? It could very well be that gpu folks
call that "standard use case" ... At least on x86 with the i915 driver
we pretty much rely on architectural guarantees for how cache flushes
work very much. Down to userspace doing the cache flushing for
mappings the kernel has set up.
i915 is a very specific case of a fully contained,
architecture-specific hardware subsystem, where you can just hardcode
all integration details inside the driver, because nobody else would
care.
Nah, it's not fully contained, it's just with your arm eyewear on you
can't see how badly it leaks all over the place. The problem is that GPUs
(in many cases at least) want to decide whether and when to do cache
maintenance. We need a combo of iommu and cache maintenance apis that
allows this, across multiple devices, because the dma-api doesn't cut it.

Aside: The cache maintenance api _must_ do the flush when asked to, even
if it believes no cache maintenance is necessary. Even if the default mode
is coherent for a platform/dev combo, the gpu driver might want to do a
non-coherent transaction.
Post by Tomasz Figa
In ARM world, you can have the same IP blocks licensed by multiple SoC
vendors with different integration details and that often includes the
option of coherency.
My experience is that for soc gpus, you need to retune up/download code
for every soc. Or at least every family of socs.

This is painful. I guess thus far the arm soc gpu drivers we have merged
aren't the ones that needed widely different tuning on different soc
families/ranges. What's worse, it's userspace who will decide whether to
use the coherent or non-coherent paths in many cases (that's at least how
it worked since decades on the big gpus, small gpus just lag a few years
usually). The kerenl only provides the tools for the userspace
opengl/vulkan driver to do all the things.

Trying to hide coherent vs. non-coherent like it's done for everyone else
is probably not going to work for gpus. In fact, hasn't ever worked for
gpus thus far, and unlikely to change I think.
-Daniel
Post by Tomasz Figa
Post by Daniel Vetter
Post by Christoph Hellwig
So I'd much prefer if we could have iommu APIs wrapping these that are
specific to actual uses cases that we understand well.
As for the buffer sharing: at least for the DMA API side I want to
move the current buffer sharing users away from dma_alloc_coherent
(and coherent dma_alloc_attrs users) and the remapping done in there
required for non-coherent architectures. Instead I'd like to allocate
plain old pages, and then just dma map them for each device separately,
with DMA_ATTR_SKIP_CPU_SYNC passed for all but the first user to map
or last user to unmap. On the iommu side it could probably work
similar.
I think this is what's often done. Except then there's also the issue
of how to get at the cma allocator if your device needs something
contiguous. There's a lot of that still going on in graphics/media.
I suppose one could just expose CMA with the default pool directly.
It's just an allocator, so not sure why it would need any
device-specific information.
There is also the use case of using CMA with device-specific pools of
memory reusable by the system when not used by the device and those
would have to somehow get the pool to allocate from, but I wonder if
struct device is the right way to pass such information. I'd see the
pool given explicitly like cma_alloc(struct cma_pool *, size, flags)
and perhaps a wrapper cma_alloc_default(size, flags) that is just a
simple macro calling cma_alloc(&cma_pool_default, size, flags).
Best regards,
Tomasz
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Christoph Hellwig
2018-11-30 09:44:25 UTC
Permalink
Post by Daniel Vetter
Post by Tomasz Figa
Whether the cache maintenance operation needs to actually do anything
or not is a function of `dev`. We can have some devices that are
coherent with CPU caches, and some that are not, on the same system.
So the thing is that the gpu driver knows this too. It fairly often can
even decide at runtime (through surface state bits or gpu pte bits)
whether to use coherent or non-coherent transactions. dma-api assuming
that a given device is always coherent or non-coherent is one of the
fundamental mismatches we have.
If you otoh need dev because there's some strange bridge caches you need
to flush (I've never seen that, but who knows), that would be a diffeernt
thing. All the bridge flushing I've seen is attached to the iommu though,
so would be really a surprise if the cache management needs that too.
Strange bridge caches aren't the problem. Outside of magic components
like SOCs integrated GPUs the issue is that a platform can wire up a
PCIe/AXI/etc bus either so that it is cache coherent, or not cache
coherent (does not snooping). Drivers need to use the full DMA API
include dma_sync_* to cater for the non-coherent case, which will turn
into no-ops if DMA is coherent.

Now PCIe now has unsnooped transactions, which can be non-coherent even
if the bus would otherwise be coherent. We have so far very much
ignored those in Linux (at least Linux in general, I know you guys have
some driver-local hacks), but if that use case, or similar ones for GPUs
on SOCs become common we need to find a solution.

One of the major issues here is that architectures that always are
DMA coherent might not even have the cache flushing instructions, or even
if they do we have not wired them up in the DMA code as we didn't need
them.

So what we'd need to support this properly is to do something like:

- add new arch hook that allows an architecture to say it supports
optional non-coherent at the arch level, and for a given device
- wire up arch_sync_dma_for_{device,cpu} for those architectures that
define it if they don't currently have it (e.g. x86)
- add a new DMA_ATTR_* flag to opt into cache flushing even if the
device declares it is otherwise coherent

And I'd really like to see that work driven by an actual user.
Christoph Hellwig
2018-11-29 18:33:34 UTC
Permalink
Post by Daniel Vetter
What kind of abuse do you expect? It could very well be that gpu folks
call that "standard use case" ... At least on x86 with the i915 driver
we pretty much rely on architectural guarantees for how cache flushes
work very much. Down to userspace doing the cache flushing for
mappings the kernel has set up.
Mostly the usual bypasses of the DMA API because people know better
(and with that I don't mean low-level IOMMU API users, but "creative"
direct mappings).
Post by Daniel Vetter
Post by Christoph Hellwig
As for the buffer sharing: at least for the DMA API side I want to
move the current buffer sharing users away from dma_alloc_coherent
(and coherent dma_alloc_attrs users) and the remapping done in there
required for non-coherent architectures. Instead I'd like to allocate
plain old pages, and then just dma map them for each device separately,
with DMA_ATTR_SKIP_CPU_SYNC passed for all but the first user to map
or last user to unmap. On the iommu side it could probably work
similar.
I think this is what's often done. Except then there's also the issue
of how to get at the cma allocator if your device needs something
contiguous. There's a lot of that still going on in graphics/media.
Being able to dip into CMA and mayb iommu coalescing if we want to
get fancy is indeed the only reason for this API. If we just wanted
to map pages we could already do that now with just a little bit
of boilerplate code (and quite a few drivers do - just adding this
new API will remove tons of code).
Daniel Vetter
2018-11-30 09:46:04 UTC
Permalink
Post by Christoph Hellwig
Post by Daniel Vetter
What kind of abuse do you expect? It could very well be that gpu folks
call that "standard use case" ... At least on x86 with the i915 driver
we pretty much rely on architectural guarantees for how cache flushes
work very much. Down to userspace doing the cache flushing for
mappings the kernel has set up.
Mostly the usual bypasses of the DMA API because people know better
(and with that I don't mean low-level IOMMU API users, but "creative"
direct mappings).
Ah yes. I think that even gpu folks get behind :-) Best if someone
bothered to explicitly cast to dma_addr_t to shut up the tools, but not
fix the bug ...
Post by Christoph Hellwig
Post by Daniel Vetter
Post by Christoph Hellwig
As for the buffer sharing: at least for the DMA API side I want to
move the current buffer sharing users away from dma_alloc_coherent
(and coherent dma_alloc_attrs users) and the remapping done in there
required for non-coherent architectures. Instead I'd like to allocate
plain old pages, and then just dma map them for each device separately,
with DMA_ATTR_SKIP_CPU_SYNC passed for all but the first user to map
or last user to unmap. On the iommu side it could probably work
similar.
I think this is what's often done. Except then there's also the issue
of how to get at the cma allocator if your device needs something
contiguous. There's a lot of that still going on in graphics/media.
Being able to dip into CMA and mayb iommu coalescing if we want to
get fancy is indeed the only reason for this API. If we just wanted
to map pages we could already do that now with just a little bit
of boilerplate code (and quite a few drivers do - just adding this
new API will remove tons of code).
Sounds like the future is very bright indeed \o/
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Christoph Hellwig
2018-12-07 01:38:41 UTC
Permalink
Post by Daniel Vetter
Being able to dip into CMA and maybe iommu coalescing if we want to
get fancy is indeed the only reason for this API. If we just wanted
to map pages we could already do that now with just a little bit
of boilerplate code (and quite a few drivers do - just adding this
new API will remove tons of code).
Sounds like the future is very bright indeed \o/
So, I spent some time with this and instead of a new API I think
it makes sure we have DMA_ATTR_NON_CONSISTENT consistently available
and with well defined semantics, that is virt_to_page on the return
value works, it is contiguos and we can use dma_sync_single_for_cpu
and dma_sync_single_for_device for ownership tranfers.

Can you graphics folks check if this:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-noncoherent-allocator

is something to work with? Especially to get rid of the horrible
dma_get_sgtable hacks?
Rob Clark
2018-12-07 14:29:42 UTC
Permalink
Post by Christoph Hellwig
Post by Daniel Vetter
Being able to dip into CMA and maybe iommu coalescing if we want to
get fancy is indeed the only reason for this API. If we just wanted
to map pages we could already do that now with just a little bit
of boilerplate code (and quite a few drivers do - just adding this
new API will remove tons of code).
Sounds like the future is very bright indeed \o/
So, I spent some time with this and instead of a new API I think
it makes sure we have DMA_ATTR_NON_CONSISTENT consistently available
and with well defined semantics, that is virt_to_page on the return
value works, it is contiguos and we can use dma_sync_single_for_cpu
and dma_sync_single_for_device for ownership tranfers.
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-noncoherent-allocator
is something to work with? Especially to get rid of the horrible
dma_get_sgtable hacks?
I'm not sure I really have strong opinion about this. For some of the
display-only SoC drm drivers which use the CMA helpers, it might be
useful.

For the drm drivers for real GPUs, we aren't really using the dma api
to allocate in the first place. (And the direction is towards more
support for userptr allocations)

BR,
-R

Brian Starkey
2018-11-29 17:33:03 UTC
Permalink
Hi Christoph,
Post by Christoph Hellwig
As for the buffer sharing: at least for the DMA API side I want to
move the current buffer sharing users away from dma_alloc_coherent
(and coherent dma_alloc_attrs users) and the remapping done in there
required for non-coherent architectures. Instead I'd like to allocate
plain old pages, and then just dma map them for each device separately,
with DMA_ATTR_SKIP_CPU_SYNC passed for all but the first user to map
or last user to unmap. On the iommu side it could probably work
similar.
I have done some preliminary work on this, and want to get it into this
merge window, but there is a few other bits I need to sort out first.
This sounds very useful for ion, to avoid CPU cache maintenance as
long as the buffer stays in device-land.

One question though: How would you determine "the last user to unmap"
to know when to do the final "make visible to CPU" step?

Thanks,
-Brian
Christoph Hellwig
2018-11-29 18:35:50 UTC
Permalink
Post by Brian Starkey
This sounds very useful for ion, to avoid CPU cache maintenance as
long as the buffer stays in device-land.
One question though: How would you determine "the last user to unmap"
to know when to do the final "make visible to CPU" step?
I'd assume the user of the DMA API keeps track of it. E.g. dmabuf
would be a pretty natural layer to implement common logic for this.
Rob Clark
2018-11-30 01:15:23 UTC
Permalink
Post by Christoph Hellwig
Post by Daniel Vetter
Yeah we had patches to add manual cache management code to drm, so we
don't have to abuse the dma streaming api anymore. Got shouted down.
Abusing the dma streaming api also gets shouted down. It's a gpu, any
idea of these drivers actually being platform independent is out of
the window from the start anyway, so we're ok with tying this to
platforms.
Manual or not the iommu API is missing APIs for cache management,
which makes it kinda surprising it actually ever worked for non-coherent
devices.
And fortunately while some people spent the last year ot two bickering
about the situation others actually did work, and we now have a
generic arch_sync_dma_for_device/arch_sync_dma_for_cpu kernel-internal
API. This is only used for DMA API internals so far, and explicitly
not intended for direct driver use, but it would be perfect as the
backend for iommu API cache maintainance functions. It exists on all
but two architectures on mainline. Out of those powerpc is in the works,
only arm32 will need some major help.
oh, hmm, I realized I'd been looking still at the armv7 dma-mapping, I
hadn't noticed arch_sync_* yet.. that does look like a step in the
right direction.

As far as hiding cache ops behind iommu layer, I guess I'd been
thinking more of just letting the drivers that want to bypass dma
layer take things into their own hands.. tbh I think/assume/hope
drm/msm is more the exception than the rule as far as needing to
bypass the dma layer. Or at least I guess the # of drivers having
problems w/ the dma layer is less than the # of iommu drivers.

(Not sure if that changes my thoughts on this patch, it isn't like
what this patch replaces isn't also a problematic hack around the
inability to bypass the dma layer. In the short term I just want
*something* that works, I'm totally happy to refactor later when there
are better options.)

BR,
-R
Christoph Hellwig
2018-11-30 09:35:41 UTC
Permalink
Post by Rob Clark
As far as hiding cache ops behind iommu layer, I guess I'd been
thinking more of just letting the drivers that want to bypass dma
layer take things into their own hands.. tbh I think/assume/hope
drm/msm is more the exception than the rule as far as needing to
bypass the dma layer. Or at least I guess the # of drivers having
problems w/ the dma layer is less than the # of iommu drivers.
So the whole bypass thing has already been a contentious discussion
in the past. One thing that the API aready enforces is that we pass
a DMA direction, which I want to keep. The other is that we need
a struct device (or something similiar) that is used to check if the
device is cache coherent or not. In your MSM case you might know that,
but in general it really is a platform issue that I don't want every
driver to know about.
Post by Rob Clark
(Not sure if that changes my thoughts on this patch, it isn't like
what this patch replaces isn't also a problematic hack around the
inability to bypass the dma layer. In the short term I just want
*something* that works, I'm totally happy to refactor later when there
are better options.)
The current patch is simply broken. You can't just construct your
own S/G list and pass it to the DMA API, and we enforce that very
strictly with dma debug enabled.

So your only options are: a) actually use the DMA API for creating
the mapping, by e.g. using dma_direct_ops ontop of an actual IOMMU
managed in the background, or b) figure out a way to do cache
maintainance for raw IOMMU API drivers.
Christoph Hellwig
2018-11-29 15:53:11 UTC
Permalink
Post by Rob Clark
Post by Christoph Hellwig
As I told you before: hell no. If you spent the slightest amount of
actually trying to understand what you are doing here you'd know this
can't work. Just turn on dma debugging and this will blow up in your
face.
you can tone it down.. we weren't the ones who created the dma/iommu
mess, we are just trying to find a way to work around it
But you don't listen and take someone busy for a day as approval,
which is absolutely not the case. Please wait for a day or two for
people to respond instead of forcing already NACKed patches again in
an act of passive agression.
Rob Clark
2018-11-29 18:44:11 UTC
Permalink
Post by Christoph Hellwig
Post by Rob Clark
Post by Christoph Hellwig
As I told you before: hell no. If you spent the slightest amount of
actually trying to understand what you are doing here you'd know this
can't work. Just turn on dma debugging and this will blow up in your
face.
you can tone it down.. we weren't the ones who created the dma/iommu
mess, we are just trying to find a way to work around it
But you don't listen and take someone busy for a day as approval,
which is absolutely not the case. Please wait for a day or two for
people to respond instead of forcing already NACKed patches again in
an act of passive agression.
I'm not sending my pull-req to Dave tomorrow (or even the day after
;-)), so there is ofc still time for people to respond.

BR,
-R
Loading...