Bug 42008

Summary: xserver-xorg-video-intel crash due to double free
Product: xorg Reporter: Thilo-Alexander Ginkel <thilo>
Component: Server/Ext/DRIAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: high CC: chris, eugeni, jeremyhu, przanoni
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard: 2011BRB_Reviewed
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 36141    

Description Thilo-Alexander Ginkel 2011-10-19 12:13:41 UTC
The following issue occurs on my Kubuntu 11.10 system (Lenovo T420s w/ Intel HD Graphics 3000).

With KWin desktop effects enabled, Xorg crashes on each second login.

$ uname -a
Linux orion 3.1.0-0301rc10-generic #201110181253 SMP Tue Oct 18 12:57:33 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

$ lspci -vv
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
        Subsystem: Lenovo Device 21d2
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 49
        Region 0: Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
        Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at 4000 [size=64]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: <access denied>
        Kernel driver in use: i915
        Kernel modules: i915

Full backtrace (based on commit 2608a367acba7247e50754c3daeed09ba2e97d05):
-- 8< --
#0  0x00007fcbf4fa63a5 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
        resultvar = 0
        pid = <optimized out>
        selftid = <optimized out>
#1  0x00007fcbf4fa9b0b in __GI_abort () at abort.c:92
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x4, sa_sigaction = 0x4}, sa_mask = {__val = {5, 
              140735046524302, 10, 140513966347903, 3, 140735046516714, 6, 140513966347907, 2, 
              140735046516734, 2, 140513966338901, 1, 140513966347903, 3, 140735046516708}}, sa_flags = 12, 
          sa_restorer = 0x7fcbf50cd683}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007fcbf4fde113 in __libc_message (do_abort=2, 
    fmt=0x7fcbf50cf0d8 "*** glibc detected *** %s: %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
        ap = {{gp_offset = 40, fp_offset = 48, overflow_arg_area = 0x7fff6e748960, 
            reg_save_area = 0x7fff6e748870}}
        ap_copy = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7fff6e748960, 
            reg_save_area = 0x7fff6e748870}}
        fd = 2
        on_2 = <optimized out>
        list = <optimized out>
        nlist = <optimized out>
        cp = <optimized out>
        written = <optimized out>
#3  0x00007fcbf4fe8a96 in malloc_printerr (action=3, str=0x7fcbf50cf278 "double free or corruption (!prev)", 
    ptr=<optimized out>) at malloc.c:6283
        buf = "0000000002bf2230"
        cp = <optimized out>
#4  0x00007fcbf4fecd7c in __GI___libc_free (mem=<optimized out>) at malloc.c:3738
        ar_ptr = 0x7fcbf530a1c0
        p = <optimized out>
        hook = <optimized out>
#5  0x00007fcbf31ae5de in i830_dri2_frame_event_drawable_gone (data=0x2bf2230, id=29360131)
    at ../../src/intel_dri.c:678
        resource = 0x2bf2230
#6  0x000000000044ea5c in FreeClientResources (client=0x2d01c10) at ../../dix/resource.c:854
        rtype = <optimized out>
        resources = <optimized out>
        this = 0x2f4b0c0
        j = <optimized out>
#7  0x000000000042f05a in CloseDownClient (client=0x2d01c10) at ../../dix/dispatch.c:3461
        really_close_down = <optimized out>
#8  0x000000000042fb9e in Dispatch () at ../../dix/dispatch.c:441
        clientReady = 0x29f4830
        result = <optimized out>
        client = 0x2d01c10
        nready = 0
        icheck = 0x7f1470
        start_tick = 680
#9  0x00000000004232fe in main (argc=8, argv=<optimized out>, envp=<optimized out>) at ../../dix/main.c:287
        i = <optimized out>
        alwaysCheckForInput = {0, 1}
-- 8< --

I bi-sected xserver-xorg-video-intel (as available at git://git.debian.org/git/pkg-xorg/driver/xserver-xorg-video-intel, which is what the Ubuntu packages are based on). The first bad commit is:

commit 2608a367acba7247e50754c3daeed09ba2e97d05
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jul 11 16:28:15 2011 +0100

    dri: Prevent abuse of the Resource database
    
    The Resource database is only designed to store a single value for a
    particular type associated with an XID. Due to the asynchronous nature
    of the vblank/flip requests, we would often associate multiple frame
    events with a particular drawable/client. Upon freeing the resource, we
    would not necessarily decouple the right value, leaving a stale pointer
    behind. Later when the client disappeared, we would write through that
    stale pointer upsetting valgrind and causing memory corruption. MDK.
    
    Instead, we need to implement an extra layer for tracking multiple
    frames within a single Resource.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37700
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 1 Thilo-Alexander Ginkel 2011-10-19 12:21:14 UTC
The corresponding Ubuntu bug is abvailable at: https://bugs.launchpad.net/xserver-xorg-video-intel/+bug/876762
Comment 2 Chris Wilson 2011-10-19 15:47:11 UTC
http://cgit.freedesktop.org/~ickle/xserver/commit/?id=65a272e7ae9392a5716a620d669ef5261241bc4b

dri2: Register the DRI2DrawableType after server regeneration
The Resource database is reset upon regeneration and so the dri2 module
needs to re-register its RESTYPE for the drawable or else it will
clobber the next unsuspecting user of the database. Fortunately, DRI2 is
loaded late in the initialisation sequence and was last up until
xf86-video-intel started using the Resource database to track
outstanding swaps...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 3 Jeremy Huddleston Sequoia 2011-10-28 18:37:54 UTC
Ping.  Chris, why isn't this merged?  Please send it to xorg-devel for review.
Comment 4 Chris Wilson 2011-10-29 01:27:04 UTC
v1 or v2 is waiting for tlc, id:1314273033-1334-1-git-send-email-chris@chris-wilson.co.uk
Comment 5 Paulo Zanoni 2011-10-29 05:39:09 UTC
Easier way to reproduce the bug on xf86-video-intel and xserver git master (using SandyBridge):

# X -retro :0
# DISPLAY=:0 glxinfo
# DISPLAY=:0 glxinfo

After the second glxinfo, X will crash. Notice that you can't have
other X clients, so you can trigger new server generations.

My Tested-by was sent to Chris' patch on the list.

(what does tlc mean? google didn't help me)
Comment 6 Chris Wilson 2011-10-29 05:41:24 UTC
tlc - tender, love and care. ;-)
Comment 7 Jeremy Huddleston Sequoia 2011-11-21 20:26:17 UTC
commit 7972e2dade58158bb98f5b7dc5f873b9fb3446de
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Aug 25 16:04:04 2011 +0100

    dri2: Register the DRI2DrawableType after server regeneration
    
    The Resource database is reset upon regeneration and so the dri2 module
    needs to re-register its RESTYPE for the drawable or else it will
    clobber the next unsuspecting user of the database. Fortunately, DRI2 is
    loaded late in the initialisation sequence and was last up until
    xf86-video-intel started using the Resource database to track
    outstanding swaps...
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Jeremy Huddleston <jeremyhu@apple.com>
    Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
    (cherry picked from commit 34b0e4eee911f8b09a3682a7f1b4c8598ef48b8d)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.