Skip to content
  • Thomas Gleixner's avatar
    infrastructure to debug (dynamic) objects · 3ac7fe5a
    Thomas Gleixner authored
    
    
    We can see an ever repeating problem pattern with objects of any kind in the
    kernel:
    
    1) freeing of active objects
    2) reinitialization of active objects
    
    Both problems can be hard to debug because the crash happens at a point where
    we have no chance to decode the root cause anymore.  One problem spot are
    kernel timers, where the detection of the problem often happens in interrupt
    context and usually causes the machine to panic.
    
    While working on a timer related bug report I had to hack specialized code
    into the timer subsystem to get a reasonable hint for the root cause.  This
    debug hack was fine for temporary use, but far from a mergeable solution due
    to the intrusiveness into the timer code.
    
    The code further lacked the ability to detect and report the root cause
    instantly and keep the system operational.
    
    Keeping the system operational is important to get hold of the debug
    information without special debugging aids like serial consoles and special
    knowledge of the bug reporter.
    
    The problems described above are not restricted to timers, but timers tend to
    expose it usually in a full system crash.  Other objects are less explosive,
    but the symptoms caused by such mistakes can be even harder to debug.
    
    Instead of creating specialized debugging code for the timer subsystem a
    generic infrastructure is created which allows developers to verify their code
    and provides an easy to enable debug facility for users in case of trouble.
    
    The debugobjects core code keeps track of operations on static and dynamic
    objects by inserting them into a hashed list and sanity checking them on
    object operations and provides additional checks whenever kernel memory is
    freed.
    
    The tracked object operations are:
    - initializing an object
    - adding an object to a subsystem list
    - deleting an object from a subsystem list
    
    Each operation is sanity checked before the operation is executed and the
    subsystem specific code can provide a fixup function which allows to prevent
    the damage of the operation.  When the sanity check triggers a warning message
    and a stack trace is printed.
    
    The list of operations can be extended if the need arises.  For now it's
    limited to the requirements of the first user (timers).
    
    The core code enqueues the objects into hash buckets.  The hash index is
    generated from the address of the object to simplify the lookup for the check
    on kfree/vfree.  Each bucket has it's own spinlock to avoid contention on a
    global lock.
    
    The debug code can be compiled in without being active.  The runtime overhead
    is minimal and could be optimized by asm alternatives.  A kernel command line
    option enables the debugging code.
    
    Thanks to Ingo Molnar for review, suggestions and cleanup patches.
    
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    Cc: Greg KH <greg@kroah.com>
    Cc: Randy Dunlap <randy.dunlap@oracle.com>
    Cc: Kay Sievers <kay.sievers@vrfy.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    3ac7fe5a