Discussion:
[PATCH] dma-buf: fix debugfs versus rcu and fence dumping
j***@redhat.com
2018-12-06 01:41:03 UTC
Permalink
From: Jérôme Glisse <***@redhat.com>

The debugfs take reference on fence without dropping them. Also the
rcu section are not well balance. Fix all that ...

Signed-off-by: Jérôme Glisse <***@redhat.com>
Cc: Christian König <***@amd.com>
Cc: Daniel Vetter <***@ffwll.ch>
Cc: Sumit Semwal <***@linaro.org>
Cc: linux-***@vger.kernel.org
Cc: dri-***@lists.freedesktop.org
Cc: linaro-mm-***@lists.linaro.org
Cc: Stéphane Marchesin <***@chromium.org>
Cc: ***@vger.kernel.org
---
drivers/dma-buf/dma-buf.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 13884474d158..f6f4de42ac49 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1051,24 +1051,31 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused)
fobj = rcu_dereference(robj->fence);
shared_count = fobj ? fobj->shared_count : 0;
fence = rcu_dereference(robj->fence_excl);
+ fence = dma_fence_get_rcu(fence);
if (!read_seqcount_retry(&robj->seq, seq))
break;
rcu_read_unlock();
}
-
- if (fence)
+ if (fence) {
seq_printf(s, "\tExclusive fence: %s %s %ssignalled\n",
fence->ops->get_driver_name(fence),
fence->ops->get_timeline_name(fence),
dma_fence_is_signaled(fence) ? "" : "un");
+ dma_fence_put(fence);
+ }
+
+ rcu_read_lock();
for (i = 0; i < shared_count; i++) {
fence = rcu_dereference(fobj->shared[i]);
if (!dma_fence_get_rcu(fence))
continue;
+ rcu_read_unlock();
seq_printf(s, "\tShared fence: %s %s %ssignalled\n",
fence->ops->get_driver_name(fence),
fence->ops->get_timeline_name(fence),
dma_fence_is_signaled(fence) ? "" : "un");
+ dma_fence_put(fence);
+ rcu_read_lock();
}
rcu_read_unlock();
--
2.17.2
Koenig, Christian
2018-12-06 08:09:28 UTC
Permalink
Post by j***@redhat.com
The debugfs take reference on fence without dropping them. Also the
rcu section are not well balance. Fix all that ...
Well NAK, you are now taking the RCU lock twice and dropping the RCU and
still accessing fobj has a huge potential for accessing freed up memory.

The only correct thing I can see here is to grab a reference to the
fence before printing any info on it,
Christian.
Post by j***@redhat.com
---
drivers/dma-buf/dma-buf.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 13884474d158..f6f4de42ac49 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1051,24 +1051,31 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused)
fobj = rcu_dereference(robj->fence);
shared_count = fobj ? fobj->shared_count : 0;
fence = rcu_dereference(robj->fence_excl);
+ fence = dma_fence_get_rcu(fence);
if (!read_seqcount_retry(&robj->seq, seq))
break;
rcu_read_unlock();
}
-
- if (fence)
+ if (fence) {
seq_printf(s, "\tExclusive fence: %s %s %ssignalled\n",
fence->ops->get_driver_name(fence),
fence->ops->get_timeline_name(fence),
dma_fence_is_signaled(fence) ? "" : "un");
+ dma_fence_put(fence);
+ }
+
+ rcu_read_lock();
for (i = 0; i < shared_count; i++) {
fence = rcu_dereference(fobj->shared[i]);
if (!dma_fence_get_rcu(fence))
continue;
+ rcu_read_unlock();
seq_printf(s, "\tShared fence: %s %s %ssignalled\n",
fence->ops->get_driver_name(fence),
fence->ops->get_timeline_name(fence),
dma_fence_is_signaled(fence) ? "" : "un");
+ dma_fence_put(fence);
+ rcu_read_lock();
}
rcu_read_unlock();
Jerome Glisse
2018-12-06 15:21:17 UTC
Permalink
Post by Koenig, Christian
Post by j***@redhat.com
The debugfs take reference on fence without dropping them. Also the
rcu section are not well balance. Fix all that ...
Well NAK, you are now taking the RCU lock twice and dropping the RCU and
still accessing fobj has a huge potential for accessing freed up memory.
The only correct thing I can see here is to grab a reference to the
fence before printing any info on it,
Christian.
Hu ? That is exactly what i am doing, take reference under rcu,
rcu_unlock print the fence info, drop the fence reference, rcu
lock rinse and repeat ...

Note that the fobj in _existing_ code is access outside the rcu
end that there is an rcu imbalance in that code ie a lonlely
rcu_unlock after the for loop.

So that the existing code is broken.
Post by Koenig, Christian
Post by j***@redhat.com
---
drivers/dma-buf/dma-buf.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 13884474d158..f6f4de42ac49 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1051,24 +1051,31 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused)
fobj = rcu_dereference(robj->fence);
shared_count = fobj ? fobj->shared_count : 0;
fence = rcu_dereference(robj->fence_excl);
+ fence = dma_fence_get_rcu(fence);
if (!read_seqcount_retry(&robj->seq, seq))
break;
rcu_read_unlock();
}
-
- if (fence)
+ if (fence) {
seq_printf(s, "\tExclusive fence: %s %s %ssignalled\n",
fence->ops->get_driver_name(fence),
fence->ops->get_timeline_name(fence),
dma_fence_is_signaled(fence) ? "" : "un");
+ dma_fence_put(fence);
+ }
+
+ rcu_read_lock();
for (i = 0; i < shared_count; i++) {
fence = rcu_dereference(fobj->shared[i]);
if (!dma_fence_get_rcu(fence))
continue;
+ rcu_read_unlock();
seq_printf(s, "\tShared fence: %s %s %ssignalled\n",
fence->ops->get_driver_name(fence),
fence->ops->get_timeline_name(fence),
dma_fence_is_signaled(fence) ? "" : "un");
+ dma_fence_put(fence);
+ rcu_read_lock();
}
rcu_read_unlock();
Koenig, Christian
2018-12-06 16:08:12 UTC
Permalink
Post by Jerome Glisse
Post by Koenig, Christian
Post by j***@redhat.com
The debugfs take reference on fence without dropping them. Also the
rcu section are not well balance. Fix all that ...
Well NAK, you are now taking the RCU lock twice and dropping the RCU and
still accessing fobj has a huge potential for accessing freed up memory.
The only correct thing I can see here is to grab a reference to the
fence before printing any info on it,
Christian.
Hu ? That is exactly what i am doing, take reference under rcu,
rcu_unlock print the fence info, drop the fence reference, rcu
lock rinse and repeat ...
Note that the fobj in _existing_ code is access outside the rcu
end that there is an rcu imbalance in that code ie a lonlely
rcu_unlock after the for loop.
So that the existing code is broken.
No, the existing code is perfectly fine.

Please note the break in the loop before the rcu_unlock();
Post by Jerome Glisse
if (!read_seqcount_retry(&robj->seq, seq))
break; <- HERE!
rcu_read_unlock();
}
So your patch breaks that and take the RCU read lock twice.

Regards,
Christian.
Post by Jerome Glisse
Post by Koenig, Christian
Post by j***@redhat.com
---
drivers/dma-buf/dma-buf.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 13884474d158..f6f4de42ac49 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1051,24 +1051,31 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused)
fobj = rcu_dereference(robj->fence);
shared_count = fobj ? fobj->shared_count : 0;
fence = rcu_dereference(robj->fence_excl);
+ fence = dma_fence_get_rcu(fence);
if (!read_seqcount_retry(&robj->seq, seq))
break;
rcu_read_unlock();
}
-
- if (fence)
+ if (fence) {
seq_printf(s, "\tExclusive fence: %s %s %ssignalled\n",
fence->ops->get_driver_name(fence),
fence->ops->get_timeline_name(fence),
dma_fence_is_signaled(fence) ? "" : "un");
+ dma_fence_put(fence);
+ }
+
+ rcu_read_lock();
for (i = 0; i < shared_count; i++) {
fence = rcu_dereference(fobj->shared[i]);
if (!dma_fence_get_rcu(fence))
continue;
+ rcu_read_unlock();
seq_printf(s, "\tShared fence: %s %s %ssignalled\n",
fence->ops->get_driver_name(fence),
fence->ops->get_timeline_name(fence),
dma_fence_is_signaled(fence) ? "" : "un");
+ dma_fence_put(fence);
+ rcu_read_lock();
}
rcu_read_unlock();
Jerome Glisse
2018-12-06 16:19:51 UTC
Permalink
Post by Koenig, Christian
Post by Jerome Glisse
Post by Koenig, Christian
Post by j***@redhat.com
The debugfs take reference on fence without dropping them. Also the
rcu section are not well balance. Fix all that ...
Well NAK, you are now taking the RCU lock twice and dropping the RCU and
still accessing fobj has a huge potential for accessing freed up memory.
The only correct thing I can see here is to grab a reference to the
fence before printing any info on it,
Christian.
Hu ? That is exactly what i am doing, take reference under rcu,
rcu_unlock print the fence info, drop the fence reference, rcu
lock rinse and repeat ...
Note that the fobj in _existing_ code is access outside the rcu
end that there is an rcu imbalance in that code ie a lonlely
rcu_unlock after the for loop.
So that the existing code is broken.
No, the existing code is perfectly fine.
Please note the break in the loop before the rcu_unlock();
Post by Jerome Glisse
if (!read_seqcount_retry(&robj->seq, seq))
break; <- HERE!
rcu_read_unlock();
}
So your patch breaks that and take the RCU read lock twice.
Ok missed that, i wonder if the refcount in balance explains
the crash that was reported to me ... i sent a patch just for
that.

Thank you for reviewing and pointing out the code i was
oblivious too :)

Cheers,
Jérôme
Sasha Levin
2018-12-07 13:30:49 UTC
Permalink
Hi,

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v4.19.7, v4.14.86, v4.9.143, v4.4.166, v3.18.128,

v4.19.7: Build OK!
v4.14.86: Build OK!
v4.9.143: Failed to apply! Possible dependencies:
5eb2c72c8acc ("dma-buf: fence debugging")

v4.4.166: Failed to apply! Possible dependencies:
5eb2c72c8acc ("dma-buf: fence debugging")

v3.18.128: Failed to apply! Possible dependencies:
5eb2c72c8acc ("dma-buf: fence debugging")


How should we proceed with this patch?

--
Thanks,
Sasha

Loading...