Bug 99025

Summary: [KVM][GVT-d] [BDW & SKL ]Ubuntu 16.04 guest boot up with kernel panic with the newest 4.9.0+ drm-intel kernel
Product: DRI Reporter: Terrence Xu <terrence.xu>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED DUPLICATE QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: blocker    
Priority: highest CC: dorota.czaplejewicz, gordon.jin, intel-gfx-bugs, jani.saarinen, tomeu, xiong.y.zhang, zhiyuan.lv
Version: DRI gitKeywords: bisected, regression
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: BDW, SKL i915 features:
Attachments:
Description Flags
dmesg-guest.log
none
kernel config file
none
dmesg-guest-20170203-drm-tip: 2017y-02m-02d-19h-49m-15s
none
dmesg-guest-slubdebug-20170203-drm-tip: 2017y-02m-02d-19h-49m-15s
none
dmesg-guest-full-20170203-drm-tip: 2017y-02m-02d-19h-49m-15s
none
dmesg-guest-full-20170214-drm-tip: 2017y-02m-14d-22h-44m-17s none

Description Terrence Xu 2016-12-08 09:17:51 UTC
System Environment
=======
Host kernel repo: kvm.git
Host commit: master-813ae37e
Guest repo: drm-intel.git
Guest commit: drm-intel-next-queued-312c3c46

Regression?
=======
Yes

Bug detailed description
=======
The guest boot up with the latest drm-intel 4.9.0-rc4+ kernel with kernel panic, but can boot up with the drm-intel 4.8.0-rc2+.
This is KVM GVT-d environment issue.

Reproduce Steps
==============
Boot up Ubuntu 16.04 guest with the drm-intel kernel, the command as below:
qemu-system-x86_64 --enable-kvm -m 2048 -smp 4 -hda /root/ubuntu-16.04.img -usb -usbdevice tablet -device virtio-net-pci,netdev=nic0,mac=00:16:3e:60:0a:50 -netdev tap,id=nic0,script=/etc/kvm/qemu-ifup -serial stdio

Expected Result
=============
Guest boot up successfully.

Actual Result
===========
Guest boot up with kernel panic.

Analysis & Root Cause
===================
Ubuntu 16.04.1 LTS gvt-ub16 ttyS0

gvt-ub16 login: root
Password:
Last login: 一 2月  6 18:11:05 CST 2017 from 192.168.101.32 on pts/4
Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.9.0-rc4+ x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

219 packages can be updated.
0 updates are security updates.

[   15.559269] general protection fault: 0000 [#1] SMP
[   15.560254] Modules linked in: fuse serio_raw sg acpi_cpufreq i2c_piix4 i2c_core parport_pc ppdev lp parport ext4 jbd2 mbcache sr_mod sd_mod cdrom ata_generic pata_acpi virtio_net virtio_pci ata_piix virtio_ring libata virtio floppy
[   15.565042] CPU: 3 PID: 1449 Comm: systemd-logind Not tainted 4.9.0-rc4+ #6
[   15.566244] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.0-0-gd7adf60-prebuilt.qemu-project.org 04/01/2014
[   15.568460] task: ffff8800288667c0 task.stack: ffffc90000e84000
[   15.569561] RIP: 0010:[<ffffffff81204bdb>]  [<ffffffff81204bdb>] __kmalloc_track_caller+0xbb/0x200
[   15.571234] RSP: 0018:ffffc90000e87da8  EFLAGS: 00010286
[   15.572217] RAX: 0000000000000000 RBX: 00000000024000c0 RCX: 0000000000000fd2
[   15.573537] RDX: 0000000000000fd1 RSI: 0000000000000000 RDI: 000000000001c6e0
[   15.574840] RBP: ffffc90000e87de0 R08: ffff88007fd9c6e0 R09: ffff88007d003cc0
[   15.576171] R10: ffff007366706d74 R11: ffff88007985e9f8 R12: 00000000024000c0
[   15.577476] R13: 0000000000000006 R14: ffffffff811bbe63 R15: ffff88007d003cc0
[   15.578926] FS:  00007f12aa0db8c0(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000
[   15.580474] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.581510] CR2: 00007f12aa0f8000 CR3: 000000007badf000 CR4: 00000000000006e0
[   15.582497] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   15.583305] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   15.584844] Stack:
[   15.585276]  ffff880066f1d00c ffffffff82232718 ffff88007cb86a80 0000000000000006
[   15.586537]  00000000024000c0 ffff88007cb86a80 0000000000000000 ffffc90000e87e08
[   15.587500]  ffffffff811bbe11 ffff880066f1cf00 ffffffff822326bc ffff880066f1d00c
[   15.588455] Call Trace:
[   15.588760]  [<ffffffff811bbe11>] kstrdup+0x31/0x60
[   15.589362]  [<ffffffff811bbe63>] kstrdup_const+0x23/0x30
[   15.590028]  [<ffffffff81249500>] alloc_vfsmnt+0xb0/0x220
[   15.590669]  [<ffffffff812496a6>] vfs_kern_mount+0x36/0x110
[   15.591357]  [<ffffffff8124bf09>] do_mount+0x1e9/0xd10
[   15.591944]  [<ffffffff8124cd65>] SyS_mount+0x95/0xe0
[   15.592478]  [<ffffffff816ed3b7>] entry_SYSCALL_64_fastpath+0x1a/0xa9
[   15.593242] Code: 08 65 4c 03 05 77 55 e0 7e 49 83 78 10 00 4d 8b 10 0f 84 ce 00 00 00 4d 85 d2 0f 84 c5 00 00 00 49 63 41 20 48 8d 4a 01 49 8b 39 <49> 8b 1c 02 4c 89 d0 65 48 0f c7 0f 0f 94 c0 84 c0 74 bb 49 63
[   15.596546] RIP  [<ffffffff81204bdb>] __kmalloc_track_caller+0xbb/0x200
[   15.597374]  RSP <ffffc90000e87da8>
[   15.597820] ---[ end trace 83cb5720d8dea4cd ]---
[   15.598404] Kernel panic - not syncing: Fatal exception
[   15.599176] Kernel Offset: disabled
Comment 1 Terrence Xu 2016-12-08 14:40:22 UTC
Created attachment 128382 [details]
dmesg-guest.log

Attach the full guest dmesg log.
Comment 2 Terrence Xu 2017-01-04 09:07:22 UTC
Use the newest drm-intel-testing tag (drm-intel-testing-2016-12-26), this issue still exist.

Ubuntu guest dmesg as below:
[    0.519993] kvm: no hardware support^M
[    1.696007] [drm:intel_sbi_read] *ERROR* timeout waiting for SBI to complete read transaction^M
[    1.799010] [drm:intel_sbi_write] *ERROR* timeout waiting for SBI to complete write transaction^M
[    6.903747] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070^M
[    6.904648] IP: reset_common_ring+0xc3/0x130^M
[    6.905065] PGD 366dc067 ^M
[    6.905065] PUD 365c8067 ^M
[    6.905339] PMD 0 ^M
[    6.905643] ^M
[    6.905998] Oops: 0000 [#1] PREEMPT SMP^M
[    6.906378] Modules linked in: e1000^M
[    6.906821] CPU: 0 PID: 21 Comm: kworker/0:1 Not tainted 4.9.0+ #7^M
[    6.907426] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014^M
[    6.908686] Workqueue: events_long i915_hangcheck_elapsed^M
[    6.909217] task: ffff88007d279b80 task.stack: ffffc900000a8000^M
[    6.909841] RIP: 0010:reset_common_ring+0xc3/0x130^M
[    6.910317] RSP: 0018:ffffc900000abb88 EFLAGS: 00010286^M
[    6.910866] RAX: 0000000080000000 RBX: ffff880036e56000 RCX: 0000000000002030^M
[    6.911563] RDX: 00000000ffffffff RSI: ffffffff81549593 RDI: 0000000000000000^M
[    6.912305] RBP: ffffc900000abba8 R08: ffffffff81ad8b20 R09: ffffffff81cbbd7c^M
[    6.913037] R10: 0000000000000000 R11: 0000000000000040 R12: ffff88007c444000^M
[    6.913771] R13: ffff88007d312600 R14: ffff880079bb0000 R15: ffff880036e56000^M
[    6.914466] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000^M
[    6.915298] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[    6.915927] CR2: 0000000000000070 CR3: 0000000036234000 CR4: 00000000003406f0^M
[    6.920229] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
[    6.920989] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400^M
[    6.921716] Call Trace:^M
[    6.921968]  i915_gem_reset+0x248/0x3c0^M
[    6.922360]  ? _raw_spin_unlock_irqrestore+0xe/0x10^M
[    6.922860]  ? __irq_put_desc_unlock+0x1e/0x40^M
[    6.923308]  i915_reset+0xdd/0x160^M
[    6.923666]  i915_reset_and_wakeup+0xe9/0x150^M
[    6.924098]  i915_handle_error+0x1a0/0x210^M
[    6.924512]  ? scnprintf+0x3d/0x70^M
[    6.924872]  hangcheck_declare_hang+0xcb/0xf0^M
[    6.925315]  ? intel_engine_get_active_head+0xb4/0xe0^M
[    6.925853]  i915_hangcheck_elapsed+0x27f/0x2b0^M
[    6.926326]  process_one_work+0x13d/0x4a0^M
[    6.926773]  worker_thread+0x48/0x4e0^M
[    6.927141]  ? _raw_write_unlock_irqrestore+0x2e/0x60^M
[    6.927632]  ? preempt_count_sub+0x4c/0x80^M
[    6.928084]  kthread+0x101/0x140^M
[    6.928416]  ? process_one_work+0x4a0/0x4a0^M
[    6.928877]  ? kthread_create_on_node+0x40/0x40^M
[    6.929327]  ret_from_fork+0x2a/0x40^M
[    6.929675] Code: 41 5e 5d c3 41 8b 44 24 20 4c 89 f7 b9 01 00 00 00 ba 00 00 ff ff 8d b0 a0 03 00 00 41 ff 96 58 07 00 00 49 8b bc 24 80 02 00 00 <48> 8b 47 70 48 39 43 70 74 43 48 85 ff 74 06 3e 83 2f 01 74 50 ^M
[    6.931595] RIP: reset_common_ring+0xc3/0x130 RSP: ffffc900000abb88^M
[    6.932258] CR2: 0000000000000070^M
[    6.932588] ---[ end trace 30ecd9ef57e73e63 ]---^M
Comment 3 Gordon Jin 2017-01-06 10:14:23 UTC
could some i915 developer look into this issue?
This is blocking GVT-d (a.k.a graphics pass-through).
Comment 4 Chris Wilson 2017-01-06 11:07:22 UTC
There are two issues here. The first is a general memory corruption in gvt and the second is invalid gvt emulation.
Comment 5 Dorota Czaplejewicz 2017-01-30 17:06:52 UTC
Terrence, what kernel configs.modules are you using?

Did you check with any other host kernel (e.g. stock Ubuntu)?

If the problem persists, host/guest commit numbers would be nice.
Comment 6 Terrence Xu 2017-02-03 05:56:37 UTC
I still can reproduce this issue with using the newest drm-tip code as Ubuntu guest kernel.

Host 
repo: kvm.git
commit: 0c744ea Linux 4.10-rc2

Guest
repo: drm-intel.git
commit: 0f01216 drm-tip: 2017y-02m-02d-19h-49m-15s UTC integration manifest


[    0.516920] kvm: no hardware support^M
[    1.692850] [drm:intel_sbi_read] *ERROR* timeout waiting for SBI to complete read transaction^M
[    1.795851] [drm:intel_sbi_write] *ERROR* timeout waiting for SBI to complete write transaction^M
[    6.900039] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070^M
[    6.900956] IP: reset_common_ring+0x9a/0x100^M
[    6.901386] PGD 36248067 ^M
[    6.901386] PUD 36247067 ^M
[    6.901680] PMD 0 ^M
[    6.902128] ^M
[    6.902539] Oops: 0000 [#1] PREEMPT SMP^M
[    6.902957] Modules linked in: e1000^M
[    6.903326] CPU: 0 PID: 21 Comm: kworker/0:1 Not tainted 4.10.0-rc6+ #8^M
[    6.904035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014^M
[    6.905240] Workqueue: events_long i915_hangcheck_elapsed^M
[    6.905830] task: ffff88007d279b80 task.stack: ffffc900000a8000^M
[    6.906425] RIP: 0010:reset_common_ring+0x9a/0x100^M
[    6.906950] RSP: 0018:ffffc900000abb98 EFLAGS: 00010246^M
[    6.907478] RAX: 0000000000000000 RBX: ffff880036e54000 RCX: 0000000000000008^M
[    6.908225] RDX: 0000000000003fd8 RSI: ffff880079bc8000 RDI: 0000000000000000^M
[    6.908987] RBP: ffffc900000abbb0 R08: 0000000000000001 R09: ffffc900100010a0^M
[    6.909700] R10: ffff88003670d188 R11: 0000000000000040 R12: ffff88007d30f600^M
[    6.910451] R13: ffff88007c420000 R14: ffff88007d30f600 R15: 000000000000001a^M
[    6.911199] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000^M
[    6.912046] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[    6.912616] CR2: 0000000000000070 CR3: 0000000036249000 CR4: 00000000003406f0^M
[    6.913376] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
[    6.914132] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400^M
[    6.914846] Call Trace:^M
[    6.915135]  i915_gem_reset_finish+0x229/0x3a0^M
[    6.915583]  ? intel_uncore_forcewake_put+0x48/0x60^M
[    6.916112]  i915_reset+0xd5/0x160^M
[    6.916453]  i915_reset_and_wakeup+0xe9/0x150^M
[    6.916901]  i915_handle_error+0x1a0/0x210^M
[    6.917345]  ? scnprintf+0x3d/0x70^M
[    6.917684]  hangcheck_declare_hang+0xcb/0xf0^M
[    6.918173]  ? intel_engine_get_active_head+0xb4/0xe0^M
[    6.918668]  i915_hangcheck_elapsed+0x27f/0x2b0^M
[    6.919172]  process_one_work+0x13d/0x4a0^M
[    6.919576]  worker_thread+0x48/0x4e0^M
[    6.919961]  ? _raw_write_unlock_irqrestore+0x2e/0x60^M
[    6.920506]  ? preempt_count_sub+0x4c/0x80^M
[    6.920920]  kthread+0x101/0x140^M
[    6.921288]  ? process_one_work+0x4a0/0x4a0^M
[    6.921705]  ? kthread_create_on_node+0x40/0x40^M
[    6.922209]  ret_from_fork+0x31/0x40^M
[    6.922570] Code: 48 8b 83 80 00 00 00 c7 40 3c ff ff ff ff 48 8b bb 80 00 00 00 e8 97 3b 00 00 8b 05 75 b6 9f 00 85 c0 75 5d 49 8b bd 88 02 00 00 <48> 8b 47 70 48 39 43 70 74 3d 48 85 ff 74 06 3e 83 2f 01 74 48 ^M
[    6.924526] RIP: reset_common_ring+0x9a/0x100 RSP: ffffc900000abb98^M
[    6.925153] CR2: 0000000000000070^M
[    6.925526] ---[ end trace 9c68741eebecd572 ]---^M
Comment 7 Terrence Xu 2017-02-03 06:10:39 UTC
Created attachment 129311 [details]
kernel config file

Attach the kernel config file.
Comment 8 Terrence Xu 2017-02-03 06:12:30 UTC
(In reply to Terrence Xu from comment #7)
> Created attachment 129311 [details]
> kernel config file
> 
> Attach the kernel config file.

This is the guest kernel config file for drm-intel.
Comment 9 Terrence Xu 2017-02-03 06:27:05 UTC
Created attachment 129312 [details]
dmesg-guest-20170203-drm-tip: 2017y-02m-02d-19h-49m-15s

Attach the newest guest dmesg guest log with panic.
Comment 10 Chuanxiao Dong 2017-02-07 03:33:24 UTC
This is GVT-d pass-though by unbind GPU from i915 driver and bind GPU to a vfio-pci device. And QA didn't use "i915.enable_gvt=1" from neither host or guest side. So no GVT code is involved.

And this issue is a regression since 4.9.0. The tested 4.8.0-rc2+ doesn't have this issue. Suggest i915 team to help to investigate this regression.
Comment 11 Daniel Vetter 2017-02-09 12:16:22 UTC
Regression = we need the bisect.
Comment 12 Dorota Czaplejewicz 2017-02-09 18:42:19 UTC
I'm having trouble reproducing this so far.

Seeing that the guest breaks on i915 functions, gvt-d is enabled. What's the host configuration needed for that?
Does the issue happen when the display manager is not started?
Comment 13 Tomeu Vizoso 2017-02-10 07:09:19 UTC
(In reply to Terrence Xu from comment #9)
> Created attachment 129312 [details]
> dmesg-guest-20170203-drm-tip: 2017y-02m-02d-19h-49m-15s
> 
> Attach the newest guest dmesg guest log with panic.

Can you boot with slub_debug and attach the whole dmesg?
Comment 14 Terrence Xu 2017-02-10 15:16:15 UTC
Created attachment 129481 [details]
dmesg-guest-slubdebug-20170203-drm-tip: 2017y-02m-02d-19h-49m-15s

Here is the guest dmesg log for "Boot up guest with slub_debug=FPZU,kmalloc-1024”.
Comment 15 Tomeu Vizoso 2017-02-13 09:44:41 UTC
(In reply to Terrence Xu from comment #14)
> Created attachment 129481 [details]
> dmesg-guest-slubdebug-20170203-drm-tip: 2017y-02m-02d-19h-49m-15s
> 
> Here is the guest dmesg log for "Boot up guest with
> slub_debug=FPZU,kmalloc-1024”.

Sorry, but that log isn't really that useful.

It's not complete, please attach the *whole* kernel output (first line should start with "Linux version").

Please make sure the cmd line args include drm.debug=0xe.

Please use slub_debug without any further options, or if you have a good reason to think those should be enough, please explain.

Also, I think it would be good if this bug contained more detailed instructions on how to reproduce the problem.
Comment 16 Terrence Xu 2017-02-14 05:11:07 UTC
The first bad commit as below:

commit 821ed7df6e2a1dbae243caebcfe21a0a4329fca0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Sep 9 14:11:53 2016 +0100

    drm/i915: Update reset path to fix incomplete requests
    
    Update reset path in preparation for engine reset which requires
    identification of incomplete requests and associated context and fixing
    their state so that engine can resume correctly after reset.
    
    The request that caused the hang will be skipped and head is reset to the
    start of breadcrumb. This allows us to resume from where we left-off.
    Since this request didn't complete normally we also need to cleanup elsp
    queue manually. This is vital if we employ nonblocking request
    submission where we may have a web of dependencies upon the hung request
    and so advancing the seqno manually is no longer trivial.
    
    ABI: gem_reset_stats / DRM_IOCTL_I915_GET_RESET_STATS
    
    We change the way we count pending batches. Only the active context
    involved in the reset is marked as either innocent or guilty, and not
    mark the entire world as pending. By inspection this only affects
    igt/gem_reset_stats (which assumes implementation details) and not
    piglit.
    
    ARB_robustness gives this guide on how we expect the user of this
    interface to behave:
    
     * Provide a mechanism for an OpenGL application to learn about
       graphics resets that affect the context.  When a graphics reset
       occurs, the OpenGL context becomes unusable and the application
       must create a new context to continue operation. Detecting a
       graphics reset happens through an inexpensive query.
    
    And with regards to the actual meaning of the reset values:
    
       Certain events can result in a reset of the GL context. Such a reset
       causes all context state to be lost. Recovery from such events
       requires recreation of all objects in the affected context. The
       current status of the graphics reset state is returned by
    
    	enum GetGraphicsResetStatusARB();
    
       The symbolic constant returned indicates if the GL context has been
       in a reset state at any point since the last call to
       GetGraphicsResetStatusARB. NO_ERROR indicates that the GL context
       has not been in a reset state since the last call.
       GUILTY_CONTEXT_RESET_ARB indicates that a reset has been detected
       that is attributable to the current GL context.
       INNOCENT_CONTEXT_RESET_ARB indicates a reset has been detected that
       is not attributable to the current GL context.
       UNKNOWN_CONTEXT_RESET_ARB indicates a detected graphics reset whose
       cause is unknown.
    
    The language here is explicit in that we must mark up the guilty batch,
    but is loose enough for us to relax the innocent (i.e. pending)
    accounting as only the active batches are involved with the reset.
    
    In the future, we are looking towards single engine resetting (with
    minimal locking), where it seems inappropriate to mark the entire world
    as innocent since the reset occurred on a different engine. Reducing the
    information available means we only have to encounter the pain once, and
    also reduces the information leaking from one context to another.
    
    v2: Legacy ringbuffer submission required a reset following hibernation,
    or else we restore stale values to the RING_HEAD and walked over
    stolen garbage.
    
    v3: GuC requires replaying the requests after a reset.
    
    v4: Restore engine IRQ after reset (so waiters will be woken!)
        Rearm hangcheck if resetting with a waiter.
    
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Mika Kuoppala <mika.kuoppala@intel.com>
    Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-13-chris@chris-wilson.co.uk
Comment 17 Terrence Xu 2017-02-14 05:50:47 UTC
(In reply to Tomeu Vizoso from comment #15)
> (In reply to Terrence Xu from comment #14)
> > Created attachment 129481 [details]
> > dmesg-guest-slubdebug-20170203-drm-tip: 2017y-02m-02d-19h-49m-15s
> > 
> > Here is the guest dmesg log for "Boot up guest with
> > slub_debug=FPZU,kmalloc-1024”.
> 
> Sorry, but that log isn't really that useful.
> 
> It's not complete, please attach the *whole* kernel output (first line
> should start with "Linux version").
> 
> Please make sure the cmd line args include drm.debug=0xe.
> 
> Please use slub_debug without any further options, or if you have a good
> reason to think those should be enough, please explain.
> 
> Also, I think it would be good if this bug contained more detailed
> instructions on how to reproduce the problem.

After I set drm.debug=0xe and slub_debug=FMZU, I got the same logs as the above attachment.
And actually it is the full log I can fetched, since it is the guest dmesg log not host dmesg log.
I added the console=ttyS0,115200,8n1 in guest grub. 
In host, I boot up guest as below:
modprobe kvm
modprobe kvm_intel
modprobe vfio
modprobe vfio_pci
echo "0000:00:02.0" > /sys/bus/pci/devices/0000:00:02.0/driver/unbind
echo "8086 1626" > /sys/bus/pci/drivers/vfio-pci/new_id  ('8086 1626' generated by 'lspci -n -s 00:02.0')
qemu-system-x86_64 -enable-kvm -vga cirrus -m 2048 -hda /home/testrunner/ubuntu-16.04.img -device vfio-pci,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x6 -usb -usbdevice tablet -net nic,macaddr=00:AA:BB:AB:DE:00 -net tap,script=/etc/qemu-ifup -serial stdio -cpu host > 3.log 2>&1 &
Comment 18 Terrence Xu 2017-02-14 15:08:31 UTC
Created attachment 129601 [details]
dmesg-guest-full-20170203-drm-tip: 2017y-02m-02d-19h-49m-15s

Finally I fetched the full guest dmesg logs!
As attachment: dmesg-guest-full-20170203-drm-tip: 2017y-02m-02d-19h-49m-15s
Comment 19 Terrence Xu 2017-02-15 08:34:31 UTC
Created attachment 129619 [details]
dmesg-guest-full-20170214-drm-tip: 2017y-02m-14d-22h-44m-17s

Update the full guest dmesg log for drm-nightly-2017-02-14 version.
Comment 20 Tomeu Vizoso 2017-02-15 09:19:52 UTC
(In reply to Terrence Xu from comment #19)
> Created attachment 129619 [details]
> dmesg-guest-full-20170214-drm-tip: 2017y-02m-14d-22h-44m-17s
> 
> Update the full guest dmesg log for drm-nightly-2017-02-14 version.

Thanks!

Today won't be able to get back to this, but in the meantime, could you please see what happens when you boot with i915.enable_rc6=0?

Also, could you figure out which code line is causing the oops?
Comment 21 Terrence Xu 2017-02-15 13:37:03 UTC
(In reply to Tomeu Vizoso from comment #20)
> (In reply to Terrence Xu from comment #19)
> > Created attachment 129619 [details]
> > dmesg-guest-full-20170214-drm-tip: 2017y-02m-14d-22h-44m-17s
> > 
> > Update the full guest dmesg log for drm-nightly-2017-02-14 version.
> 
> Thanks!
> 
> Today won't be able to get back to this, but in the meantime, could you
> please see what happens when you boot with i915.enable_rc6=0?
With the same result and the same error log as previously.
> Also, could you figure out which code line is causing the oops?
The null pointer is triggered by function "reset_common_ring" Line #1395 in "intel_lrc.c", as below:
if (request->ctx !=port[0].request->ctx) , the port[0].request->ctx is Null.
Comment 22 Dorota Czaplejewicz 2017-02-16 16:35:46 UTC
I can confirm the bug, at the same commit. One thing required to reproduce the bug is intel_iommu=on on host command line.
Comment 23 XiongZhang 2017-02-27 01:47:10 UTC

*** This bug has been marked as a duplicate of bug 99028 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.