Bug 106631

Summary:	PALM: clpeak: Bus error (core dumped) & lots of GPU lockup
Product:	Mesa	Reporter:	Ricardo Ribalda <ricardo.ribalda>
Component:	Drivers/Gallium/r600	Assignee:	Default DRI bug account <dri-devel>
Status:	RESOLVED MOVED	QA Contact:	Default DRI bug account <dri-devel>
Severity:	normal
Priority:	medium
Version:	18.0
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:
Bug Depends on:
Bug Blocks:	99553
Attachments:	dmesg dmesg 100k

Description Ricardo Ribalda 2018-05-23 13:51:30 UTC

root@qt5022:~# time clpeak 

Platform: Clover
  Device: AMD PALM (DRM 2.50.0 / 4.16.0-qtec-standard, LLVM 6.0.1)
    Driver version  : 18.0.3 (Linux x64)
    Compute units   : 2
    Clock frequency : 0 MHz

    Global memory bandwidth (GBPS)
      float   : 5.42
      float2  : 7.10
      float4  : 6.69
      float8  : 4.88
      float16 : 0.43

    Single-precision compute (GFLOPS)
      float   : 18.90
      float2  : 36.90
      float4  : 38.66
      float8  : 42.19
      float16 : 53.13

    No half precision support! Skipped

    No double precision support! Skipped

    Integer compute (GIOPS)
      int   : 9.48
      int2  : Bus error (core dumped)

real	20m10.785s
user	15m58.717s
sys	1m5.031s

Comment 1 Ricardo Ribalda 2018-05-23 13:52:24 UTC

Created attachment 139710 [details]
dmesg

Comment 2 Ricardo Ribalda 2018-05-23 13:53:04 UTC

libclc version: a2118d58fca567694edfabea78293e0dc9255500 (current HEAD)

Comment 3 Jan Vesely 2018-05-23 16:52:14 UTC

looks like the benchmark needs more than the allocated 10s to complete. you can adjust this via radeon.lockup_timeout kernel module parameter.
You can check the used value at:
/sys/module/radeon/parameters/lockup_timeout
but you'll need to set it at boot time.


(In reply to Ricardo Ribalda from comment #0)
> 
> real	20m10.785s
> user	15m58.717s
> sys	1m5.031s

oh, that's pretty slow...

Comment 4 Ricardo Ribalda 2018-05-24 07:56:49 UTC

Hi Jan

I have increased lockup_timeout to 100K and I am not getting the bus error. But I am getting similar dmesg errors

root@qt5022:~# cat /sys/module/radeon/parameters/lockup_timeout
100000

root@qt5022:~# time clpeak

Platform: Clover
  Device: AMD PALM (DRM 2.50.0 / 4.16.0-qtec-standard, LLVM 6.0.1)
    Driver version  : 18.0.3 (Linux x64)
    Compute units   : 2
    Clock frequency : 0 MHz

    Global memory bandwidth (GBPS)
      float   : 5.46
      float2  : 7.17
      float4  : 6.79
      float8  : 4.89
      float16 : 0.11

    Single-precision compute (GFLOPS)
      float   : 18.95
      float2  : 36.90
      float4  : 38.69
      float8  : 42.19
      float16 : 53.17

    No half precision support! Skipped

    No double precision support! Skipped

    Integer compute (GIOPS)
      int   : 9.49
      int2  : 18.52
      int4  : 18.41
      int8  : 18.59
      int16 : 18.05

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 1.21
      enqueueReadBuffer          : 0.63
      enqueueMapBuffer(for read) : 2.30
        memcpy from mapped ptr   : 0.87
      enqueueUnmap(after write)  : 506.05
        memcpy to mapped ptr     : 0.88

    Kernel launch latency : 608.36 us


real	29m45.765s
user	18m55.669s
sys	1m7.651s

Comment 5 Ricardo Ribalda 2018-05-24 07:57:47 UTC

Created attachment 139734 [details]
dmesg 100k

Comment 6 Ricardo Ribalda 2018-05-24 08:07:54 UTC

Eventhough it is not comparable, for reference: this is the result with fgrlx.

root@qt5022:~# time clpeak 

Platform: AMD Accelerated Parallel Processing
  Device: AMD G-T56N Processor
    Driver version  : 1800.8 (sse2) (Linux x64)
    Compute units   : 2
    Clock frequency : 530 MHz

    Global memory bandwidth (GBPS)
      float   : 0.80
      float2  : 1.12
      float4  : 1.09
      float8  : 1.31
      float16 : 1.34

    Single-precision compute (GFLOPS)
      float   : 0.59
      float2  : 1.16
      float4  : 2.32
      float8  : 4.42
      float16 : 0.85

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 0.43
      double2  : 0.84
      double4  : 1.46
      double8  : 1.41
      double16 : 0.28

    Integer compute (GIOPS)
      int   : 0.73
      int2  : 0.30
      int4  : 0.35
      int8  : 0.40
      int16 : 0.32

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 1.29
      enqueueReadBuffer          : 1.08
      enqueueMapBuffer(for read) : 3591.11
        memcpy from mapped ptr   : 0.98
      enqueueUnmap(after write)  : 15339.17
        memcpy to mapped ptr     : 0.99

    Kernel launch latency : 64.41 us


real	8m12.337s
user	12m51.022s
sys	0m29.840s

Comment 7 Jan Vesely 2018-05-24 17:51:12 UTC

It looks like even 100s is not enough. Can you try running with no time limit? (set to 0).
Looking at the numbers I think mesa's results are inflated by the kernel getting killed before finishing the computation.
Looking at the numbers it can take significantly longer.

Did you by any chance build llvm in debug mode? that can inflate kernel compile times significantly.

Comment 8 Ricardo Ribalda 2018-05-24 17:55:11 UTC

I am using llvm/clang from https://github.com/kraj/meta-clang . Can you point me to something to check if the debug mode is enabled or not?

Thanks

Comment 9 Ricardo Ribalda 2018-05-24 18:01:21 UTC

(In reply to Ricardo Ribalda from comment #8)
> I am using llvm/clang from https://github.com/kraj/meta-clang . Can you
> point me to something to check if the debug mode is enabled or not?
> 
> Thanks

Answer to myself. Seems to be a Release build :
https://github.com/kraj/meta-clang/blob/master/recipes-devtools/clang/clang_git.bb#L78

But if you can tell me how to verify it in runtime I would love to try it

Comment 10 Jan Vesely 2018-05-24 21:24:15 UTC

(In reply to Ricardo Ribalda from comment #9)
> (In reply to Ricardo Ribalda from comment #8)
> > I am using llvm/clang from https://github.com/kraj/meta-clang . Can you
> > point me to something to check if the debug mode is enabled or not?
> > 
> > Thanks
> 
> Answer to myself. Seems to be a Release build :
> https://github.com/kraj/meta-clang/blob/master/recipes-devtools/clang/
> clang_git.bb#L78
> 
> But if you can tell me how to verify it in runtime I would love to try it

$ llvm-config --assertion-mode
and
$ llvm-config --build-mode

this won't change the GPU kernel running time, but it might speed up the kernel compilation time.

Comment 11 Ricardo Ribalda 2018-05-29 13:26:25 UTC

Seems that it is in release mode

root@qt5022:~# llvm-config --assertion-mode
OFF
root@qt5022:~# llvm-config --build-mode    
Release

Comment 12 GitLab Migration User 2019-09-18 19:25:54 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/638.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.