diff --git a/Documentation/ipipe.rst b/Documentation/ipipe.rst
new file mode 100644
index 000000000000..9fba08b66253
--- /dev/null
+++ b/Documentation/ipipe.rst
@@ -0,0 +1,924 @@
+.. include:: <isonum.txt>
+
+===================================
+The Interrupt Pipeline (aka I-pipe)
+===================================
+
+:Copyright: |copy| 2018: Philippe Gerum
+
+Purpose
+=======
+
+Using Linux as a host for lightweight software cores specialized in
+delivering very short and bounded response times has been a popular
+way of supporting real-time applications in the embedded space over
+the years.
+
+This design - known as the *dual kernel* approach - introduces a small
+real-time infrastructure which schedules time-critical activities
+independently from the main kernel. Application threads co-managed by
+this infrastructure still benefit from the ancillary kernel services
+such as virtual memory management, and can also leverage the rich GPOS
+feature set Linux provides such as networking, data storage or GUIs.
+
+Although the real-time infrastructure has to present specific driver
+stack and API implementations to applications, there are nonetheless
+significant upsides to keeping the real-time core separate from the
+GPOS infrastructure:
+
+- because the two kernels are independent, real-time activities are
+  not serialized with GPOS operations internally, removing potential
+  delays which might be induced by the non time-critical
+  work. Likewise, there is no requirement for keeping the GPOS
+  operations fine-grained and highly preemptible at any time, which
+  would otherwise induce noticeable overhead on low-end hardware, due
+  to the requirement for pervasive task priority inheritance and IRQ
+  threading.
+
+- the functional isolation of the real-time infrastructure from the
+  rest of the kernel code restricts common bug hunting to the scope of
+  the smaller kernel, excluding most interactions with the very large
+  GPOS kernel base.
+
+- with a dedicated infrastructure providing a specific, well-defined
+  set of real-time services, applications can unambiguously figure out
+  which API calls are available for supporting time-critical work,
+  excluding all the rest as being potentially non-deterministic with
+  respect to response time.
+
+To support such a *dual kernel system*, we need the kernel to exhibit
+a high-priority execution context, for running out-of-band real-time
+duties concurrently to the regular operations.
+
+.. NOTE:: The I-pipe only introduces the basic mechanisms for hosting
+such a real-time core, enabling the common programming model for its
+applications in user-space. It does *not* implement the real-time core
+per se, which should be provided by a separate kernel component.
+
+The issue of interrupt response time
+====================================
+
+The real-time core has to act upon device interrupts with no delay,
+regardless of the regular kernel operations which may be ongoing when
+the interrupt is received by the CPU.
+
+However, to protect from deadlocks and maintain data integrity, Linux
+normally hard disables interrupts around any critical section of code
+which must not be preempted by interrupt handlers on the same CPU,
+enforcing a strictly serialized execution among those contexts.
+
+The unpredictable delay this may cause before external events can be
+handled is a major roadblock for kernel components requiring
+predictable and very short response times to external events, in the
+range of a few microseconds.
+
+Therefore, there is a basic requirement for prioritizing interrupt
+masking and delivery between the real-time core and GPOS operations,
+while maintaining consistent internal serialization for the kernel.
+
+To address this issue, the I-pipe implements a mechanism called
+*interrupt pipelining* turns all device IRQs into NMIs, only to run
+NMI-safe interrupt handlers from the perspective of the regular kernel
+activities.
+
+Two-stage IRQ pipeline
+======================
+
+.. _pipeline
+Interrupt pipelining is a lightweight approach based on the
+introduction of a separate, high-priority execution stage for running
+out-of-band interrupt handlers immediately upon IRQ receipt, which
+cannot be delayed by the in-band, regular kernel work even if the
+latter serializes the execution by - seemingly - disabling interrupts.
+
+IRQs which have no handlers in the high priority stage may be deferred
+on the receiving CPU until the out-of-band activity has quiesced on
+that CPU. Eventually, the preempted in-band code can resume normally,
+which may involve handling the deferred interrupts.
+
+In other words, interrupts are flowing down from the out-of-band to
+the in-band interrupt stages, which form a two-stage pipeline for
+prioritizing interrupt delivery.
+
+The runtime context of the out-of-band interrupt handlers is known as
+the *head stage* of the pipeline, as opposed to the in-band kernel
+activities sitting on the *root stage*::
+
+                    Out-of-band                 In-band
+                    IRQ handlers()            IRQ handlers()
+               __________   _______________________   ______
+                  .     /  /  .             .     /  /  .
+                  .    /  /   .             .    /  /   .
+                  .   /  /    .             .   /  /    .
+                  ___/  /______________________/  /     .
+     [IRQ] -----> _______________________________/      .
+                  .           .             .           .
+                  .   Head    .             .   Root    .
+                  .   Stage   .             .   Stage   .
+               _____________________________________________
+
+
+A software core may base its own activities on the head stage,
+interposing on specific IRQ events, for delivering real-time
+capabilities to a particular set of applications. Meanwhile, the
+regular kernel operations keep going over the root stage unaffected,
+only delayed by short preemption times for running the out-of-band
+work.
+
+.. NOTE:: Interrupt pipelining is a partial implementation of [#f2]_,
+          in which an interrupt *stage* is a limited form of an
+          operating system *domain*.
+
+Virtual interrupt flag
+----------------------
+
+.. _flag:
+As hinted earlier, predictable response time of out-of-band handlers
+to IRQ receipts requires the in-band kernel work not to be allowed to
+delay them by masking interrupts in the CPU.
+
+However, critical sections delimited this way by the in-band code must
+still be enforced for the *root stage*, so that system integrity is
+not at risk. This means that although out-of-band IRQ handlers may run
+at any time while the *head stage* is accepting interrupts, in-band
+IRQ handlers should be allowed to run only when the root stage is
+accepting interrupts too.
+
+So we need to decouple the interrupt masking and delivery logic which
+applies to the head stage from the one in effect on the root stage, by
+implementing a dual interrupt control mechanism.
+
+To this end, a software logic managing a virtual interrupt flag (aka
+*IPIPE_STALL_FLAG*) is introduced by the interrupt pipeline between
+the hardware and the generic IRQ management layer. This logic can mask
+IRQs from the perspective of the regular kernel work when
+:c:func:`local_irq_save`, :c:func:`local_irq_disable` or any
+lock-controlled masking operations like :c:func:`spin_lock_irqsave` is
+called, while still accepting IRQs from the CPU for immediate delivery
+to out-of-band handlers.
+
+The head stage protects from interrupts by disabling them in the CPU's
+status register, while the root stage disables interrupts only
+virtually. A stage for which interrupts are disabled is said to be
+*stalled*. Conversely, *unstalling* a stage means re-enabling
+interrupts for it.
+
+Obviously, stalling the head stage implicitly means disabling
+further IRQ receipts for the root stage too.
+
+Interrupt deferral for the *root stage*
+---------------------------------------
+
+.. _deferral:
+.. _deferred:
+When the root stage is stalled by setting the virtual interrupt flag,
+the occurrence of any incoming IRQ which was not delivered to the
+*head stage* is recorded into a per-CPU log, postponing its actual
+delivery to the root stage.
+
+The delivery of the interrupt event to the corresponding in-band IRQ
+handler is deferred until the in-band kernel code clears the virtual
+interrupt flag by calling :c:func:`local_irq_enable` or any of its
+variants, which unstalls the root stage. When this happens, the
+interrupt state is resynchronized by playing the log, firing the
+in-band handlers for which an IRQ was set pending.
+
+::
+   /* Both stages unstalled on entry */
+   local_irq_save(flags);
+   <IRQx received: no out-of-band handler>
+       (pipeline logs IRQx event)
+   ...
+   local_irq_restore(flags);
+       (pipeline plays IRQx event)
+            handle_IRQx_interrupt();
+
+If the root stage is unstalled at the time of the IRQ receipt, the
+in-band handler is immediately invoked, just like with the
+non-pipelined IRQ model.
+
+.. NOTE:: The principle of deferring interrupt delivery based on a
+          software flag coupled to an event log has been originally
+          described as "Optimistic interrupt protection" in [#f1]_.
+
+Device interrupts virtually turned into NMIs
+--------------------------------------------
+
+From the standpoint of the in-band kernel code (i.e. the one running
+over the *root* interrupt stage) , the interrupt pipelining logic
+virtually turns all device IRQs into NMIs, for running out-of-band
+handlers.
+
+.. _re-entry:
+For this reason, out-of-band code may generally **NOT** re-enter
+in-band code, for preventing creepy situations like this one::
+
+   /* in-band context */
+   spin_lock_irqsave(&lock, flags);
+      <IRQx received: out-of-band handler installed>
+         handle_oob_event();
+            /* attempted re-entry to in-band from out-of-band. */
+            in_band_routine();
+               spin_lock_irqsave(&lock, flags);
+               <DEADLOCK>
+               ...
+            ...
+         ...
+   ...
+   spin_unlock irqrestore(&lock, flags);
+
+Even in absence of any attempt to get a spinlock recursively, the
+outer in-band code in the example above is entitled to assume that no
+access race can occur on the current CPU while interrupts are
+masked. Re-entering in-band code from an out-of-band handler would
+invalidate this assumption.
+
+In rare cases, we may need to fix up the in-band kernel routines in
+order to allow out-of-band handlers to call them. Typically, atomic_
+helpers are such routines, which serialize in-band and out-of-band
+callers.
+
+Virtual/Synthetic interrupt vectors
+-----------------------------------
+
+.. _synthetic:
+.. _virtual:
+The pipeline introduces an additional type of interrupts, which are
+purely software-originated, with no hardware involvement. These IRQs
+can be triggered by any kernel code. So-called virtual IRQs are
+inherently per-CPU events.
+
+Because the common pipeline flow_ applies to virtual interrupts, it
+is possible to attach them to out-of-band and/or in-band handlers,
+just like device interrupts.
+
+.. NOTE:: virtual interrupts and regular softirqs differ in essence:
+          the latter only exist in the in-band context, and therefore
+          cannot trigger out-of-band activities.
+
+Virtual interrupt vectors are allocated by a call to
+:c:func:`ipipe_alloc_virq`, and conversely released with
+:c:func:`ipipe_free_virq`.
+
+For instance, a virtual interrupt can be used for triggering an
+in-band activity on the root stage from the head stage as follows::
+
+  #include <linux/ipipe.h>
+
+  static void virq_handler(unsigned int virq, void *cookie)
+  {
+        do_in_band_work();
+  }
+
+  void install_virq(void)
+  {
+     unsigned int virq;
+     ...
+     virq = ipipe_alloc_virq();
+     ...
+     ipipe_request_irq(ipipe_root_domain, virq, virq_handler,
+		       handler_arg, NULL);
+  }
+
+An out-of-band handler can schedule the execution of
+:c:func:`virq_handler` like this::
+
+  ipipe_post_irq_root(virq);
+
+Conversely, a virtual interrupt can be handled from the out-of-band
+context::
+
+  static void virq_oob_handler(unsigned int virq, void *cookie)
+  {
+        do_oob_work();
+  }
+
+  void install_virq(void)
+  {
+     unsigned int virq;
+     ...
+     virq = ipipe_alloc_virq();
+     ...
+     ipipe_request_irq(ipipe_head_domain, virq, virq_oob_handler,
+		       handler_arg, NULL);
+  }
+
+Any in-band code can trigger the immediate execution of
+:c:func:`virq_oob_handler` on the head stage as follows::
+
+  ipipe_post_irq_head(virq);
+
+Pipelined interrupt flow
+------------------------
+
+.. _flow:
+When interrupt pipelining is enabled, IRQs are first delivered to the
+pipeline entry point via a call to the generic
+:c:func:`__ipipe_dispatch_irq` routine. Before this happens, the event
+has been propagated through the arch-specific code for handling an IRQ::
+
+    asm_irq_entry
+       -> irqchip_handle_irq()
+          -> ipipe_handle_domain_irq()
+             -> __ipipe_grab_irq()
+                -> __ipipe_dispatch_irq()
+                -> irq_flow_handler()
+                <IRQ delivery logic>
+
+Contrary to the non-pipelined model, the generic IRQ flow handler does
+*not* call the in-band interrupt handler immediately, but only runs
+the irqchip-specific handler for acknowledging the incoming IRQ event
+in the hardware.
+
+.. _Holding interrupt lines:
+If the interrupt is either of the *level-triggered*, *fasteoi* or
+*percpu* type, the irqchip is given a chance to hold the interrupt
+line, typically by masking it, until either of the out-of-band or
+in-band handler have run. This addresses the following scenario, which
+happens for a similar reason while an IRQ thread waits for being
+scheduled in, requiring the same kind of provision::
+
+    /* root stage stalled on entry */
+    asm_irq_entry
+       ...
+          -> __ipipe_dispatch_irq()
+             ...
+                <IRQ logged, delivery deferred>
+    asm_irq_exit
+    /*
+     * CPU allowed to accept interrupts again with IRQ cause not
+     * acknowledged in device yet => **IRQ storm**.
+     */
+    asm_irq_entry
+       ...
+    asm_irq_exit
+    asm_irq_entry
+       ...
+    asm_irq_exit
+
+IRQ delivery logic
+------------------
+
+If an out-of-band handler exists for the interrupt received,
+:c:func:`__ipipe_dispatch_irq` invokes it immediately, after switching
+the execution context to the head stage if not current yet.
+
+Otherwise, if the execution context is currently over the root stage
+and unstalled, the pipeline core delivers it immediately to the
+in-band handler.
+
+In all other cases, the interrupt is only set pending into the per-CPU
+log, then the interrupt frame is left.
+
+Alternate scheduling
+====================
+
+The I-pipe promotes the idea that a *dual kernel* system should keep
+the functional overlap between the kernel and the real-time core
+minimal. To this end, a real-time thread should be merely seen as a
+regular task with additional scheduling capabilities guaranteeing very
+low response times.
+
+To support such idea, the I-pipe enables kthreads and regular user
+tasks to run alternatively in the out-of-band execution context
+introduced by the interrupt pipeline_ (aka *head* stage), or the
+common in-band kernel context for GPOS operations (aka *root* stage).
+
+As a result, real-time core applications in user-space benefit from
+the common Linux programming model - including virtual memory
+protection -, and still have access to the regular Linux services for
+carrying out non time-critical work.
+
+Task migration to the head stage
+--------------------------------
+
+Low latency response time to events can be achieved when Linux tasks
+wait for them from the out-of-band execution context. The real-time
+core is responsible for switching a task to such a context as part of
+its task management rules; the I-pipe facilitates this migration with
+dedicated services.
+
+The migration process of a task from the GPOS/in-band context to the
+high-priority, out-of-band context is as follows:
+
+1. :c:func:`__ipipe_migrate_head` is invoked from the migrating task
+   context, with the same prerequisites than for calling
+   :c:func:`schedule` (preemption enabled, interrupts on).
+
+.. _`in-band sleep operation`:
+2. the caller is put to interruptible sleep state (S).
+
+3. before resuming in-band operations, the next task picked by the
+   (regular kernel) scheduler on the same CPU for replacing the
+   migrating task fires :c:func:`ipipe_migration_hook` which the
+   real-time core should override (*__weak* binding). Before the call,
+   the head stage is stalled, interrupts are disabled in the CPU. The
+   root execution stage is still current though.
+
+4. the real-time core's implementation of
+   :c:func:`ipipe_migration_hook` is passed a pointer to the
+   task_struct descriptor of the migrating task. This routine is expected
+   to perform the necessary steps for taking control over the task on
+   behalf of the real-time core, re-scheduling its code appropriately
+   over the head stage. This typically involves resuming it from the
+   `out-of-band suspended state`_ applied during the converse migration
+   path.
+
+5. at some point later, when the migrated task is picked by the
+   real-time scheduler, it resumes execution on the head stage with
+   the register file previously saved by the kernel scheduler in
+   :c:func:`switch_to` at step 1.
+
+Task migration to the root stage
+--------------------------------
+
+Sometimes, a real-time thread may want to leave the out-of-band
+context, continuing execution from the in-band context instead, so as
+to:
+
+- run non time-critical (in-band) work involving regular system calls
+  handled by the kernel,
+
+- recover from CPU exceptions, such as handling major memory access
+  faults, for which there is no point in caring for response time, and
+  therefore makes no sense to duplicate in the real-time core anyway.
+
+.. NOTE: The discussion about exception_ handling covers the last
+   point in details.
+
+The migration process of a task from the high-priority, out-of-band
+context to the GPOS/in-band context is as follows::
+
+1. the real-time core schedules an in-band handler for execution which
+   should call :c:func:`wake_up_process` to unblock the migrating task
+   from the standpoint of the kernel scheduler. This is the
+   counterpart of the :ref:`in-band sleep operation <in-band sleep
+   operation>` from the converse migration path. A virtual_ IRQ can be
+   used for scheduling such event from the out-of-band context.
+
+.. _`out-of-band suspended state`:
+2. the real-time core suspends execution of the current task from its
+   own standpoint. The real-time scheduler is assumed to be using the
+   common :c:func:`switch_to` routine for switching task contexts.
+
+3. at some point later, the out-of-band context is exited by the
+   current CPU when no more high-priority work is left, causing the
+   preempted in-band kernel code to resume execution on the root
+   stage. The handler scheduled at step 1 eventually runs, waking up
+   the migrating task from the standpoint of the kernel.
+
+4. the migrating task resumes from the tail scheduling code of the
+   real-time scheduler, where it suspended in step 2. Noticing the
+   migration, the real-time core eventually calls
+   :c:func:`__ipipe_reenter_root` for finalizing the transition of the
+   incoming task to the root stage.
+
+Binding to the real-time core
+-----------------------------
+
+.. _binding:
+The I-pipe facilitates fine-grained per-thread management from the
+real-time core, as opposed to per-process. For this reason, the
+real-time core should at least implement a mechanism for turning a
+regular task into a real-time thread with extended capabilities,
+binding it to the core.
+
+The real-time core should inform the kernel about its intent to
+receive notifications about that task, by calling
+:c:func::`ipipe_enable_notifier` when such task is current.
+
+For this reason, the binding operation is usually carried out by a
+dedicated system call exposed by the real-time core, which a regular
+task would invoke.
+
+.. NOTE:: Whether there should be distinct procedures for binding
+	  processes *and* threads to the real-time core, or only a
+	  thread binding procedure is up to the real-time core
+	  implementation.
+
+Notifications
+-------------
+
+Exception handling
+~~~~~~~~~~~~~~~~~~
+
+.. _exception
+If a processor exception is raised while the CPU is busy running a
+real-time thread in the out-of-band context (e.g. due to some invalid
+memory access, bad instruction, FPU or alignment error etc), the task
+may have to leave such context immediately if the fault handler is not
+protected against out-of-band interrupts, and therefore cannot be
+properly serialized with out-of-band code.
+
+The I-pipe notifies the real-time core about incoming exceptions early
+from the low-level fault handlers, but only when some out-of-band code
+was running when the exception was taken. The real-time core may then
+take action, such as reconciling the current task's execution context
+with the kernel's expectations before the task may traverse the
+regular fault handling code.
+
+.. HINT:: Enabling debuggers to trace real-time thread involves
+          dealing with debug traps the former may poke into the
+          debuggee's code for breakpointing duties.
+
+The notification is issued by a call to :c:func:`__ipipe_notify_trap`
+which in turn invokes the :c:func:`ipipe_trap_hook` routine the
+real-time core should override for receiving those events (*__weak*
+binding). Interrupts are **disabled** in the CPU when
+:c:func:`ipipe_trap_hook` is called.::
+
+     /* out-of-band code running */
+     *bad_pointer = 42;
+        [ACCESS EXCEPTION]
+	   /* low-level fault handler in arch/<arch>/mm */
+           -> do_page_fault()
+	      -> __ipipe_notify_trap(...)
+	         /* real-time core */
+	         -> ipipe_trap_hook(...)
+		    -> forced task migration to root stage
+	   ...
+           -> handle_mm_fault()
+
+.. NOTE:: handling minor memory access faults only requiring quick PTE
+          fixups should not involve switching the current task to the
+          in-band context though. Instead, the fixup code should be
+          made atomic_ for serializing accesses from any context.
+
+System calls
+~~~~~~~~~~~~
+
+A real-time core interfaced with the kernel via the I-pipe may
+introduce its own set of system calls. From the standpoint of the
+kernel, this is a foreign set of calls, which can be distinguished
+unambiguously from regular ones based on an arch-specific marker.
+
+.. HINT:: Syscall numbers from this set might have a different base,
+	  and/or some high-order bit set which regular syscall numbers
+	  would not have.
+
+If a task bound to the real-time core issues any system call,
+regardless of which of the kernel or real-time core should handle it,
+the latter must be given the opportunity to:
+
+- perform the service directly, possibly switching the caller to
+  out-of-band context first would the request require it.
+
+- pass the request downward to the normal system call path on the root
+  stage, possibly switching the caller to in-band context if needed.
+
+If a regular task (i.e. *not* known from the real-time core [yet])
+issues any foreign system call, the real-time core is given a chance
+to handle it. This way, a foreign system call which would initially
+bind a regular task to the real-time core would be delivered to the
+real-time core as expected (see binding_).
+
+The I-pipe intercepts system calls early in the kernel entry code,
+delivering them to the proper handler according to the following
+logic::
+
+     is_foreign(syscall_nr)?
+	    Y: is_bound(task)
+	           Y: -> ipipe_fastcall_hook()
+		   N: -> ipipe_syscall_hook()
+            N: is_bound(task)
+	           Y: -> ipipe_syscall_hook()
+		   N: -> normal syscall handling
+
+:c:func:`ipipe_fastcall_hook` is the fast path for handling foreign
+system calls from tasks already running in out-of-band context.
+
+:c:func:`ipipe_syscall_hook` is a slower path for handling requests
+which might require the caller to switch to the out-of-band context
+first before proceeding.
+
+Kernel events
+~~~~~~~~~~~~~
+
+The last set of notifications involves pure kernel events which the
+real-time core may need to know about, as they may affect its own task
+management. Except for IPIPE_KEVT_CLEANUP which is called for *any*
+exiting user-space task, all other notifications are only issued for
+tasks bound to the real-time core (which may involve kthreads).
+
+The notification is issued by a call to :c:func:`__ipipe_notify_kevent`
+which in turn invokes the :c:func:`ipipe_kevent_hook` routine the
+real-time core should override for receiving those events (*__weak*
+binding). Interrupts are **enabled** in the CPU when
+:c:func:`ipipe_kevent_hook` is called.
+
+The notification hook is given the event type code, and a single
+pointer argument which relates to the event type.
+
+The following events are defined (include/linux/ipipe_domain.h):
+
+- IPIPE_KEVT_SCHEDULE(struct task_struct *next)
+
+  sent in preparation of a context switch, right before the memory
+  context is switched to *next*.
+
+- IPIPE_KEVT_SIGWAKE(struct task_struct *target)
+
+  sent when *target* is about to receive a signal. The real-time core
+  may decide to schedule a transition of the recipient to the root
+  stage in order to have it handle that signal asap, which is commonly
+  required for keeping the kernel sane. This notification is always
+  sent from the context of the issuer.
+
+- IPIPE_KEVT_SETAFFINITY(struct ipipe_migration_data *p)
+
+  sent when p->task is about to move to CPU p->dest_cpu.
+
+- IPIPE_KEVT_EXIT(struct task_struct *current)
+
+  sent from :c:func:`do_exit` before the current task has dropped the
+  files and mappings it owns.
+
+- IPIPE_KEVT_CLEANUP(struct mm_struct *mm)
+
+  sent before *mm* is entirely dropped, before the mappings are
+  exited. Per-process resources which might be maintained by the
+  real-time core could be released there, as all threads have exited.
+
+  ..NOTE:: IPIPE_KEVT_SETSCHED is deprecated, and should not be used.
+
+Prerequisites
+=============
+
+The interrupt pipeline requires the following features to be available
+from the target kernel:
+
+- Generic IRQ handling
+- Clock event abstraction
+
+Implementation
+==============
+
+The following kernel areas are involved in interrupt pipelining:
+
+- Generic IRQ core
+
+  * IRQ flow handlers
+
+    Generic flow handlers acknowledge the incoming IRQ event in the
+    hardware by calling the appropriate irqchip-specific
+    handler. However, the generic flow_ handlers do not immediately
+    invoke the in-band interrupt handlers, but leave this decision to
+    the pipeline core which calls them, according to the pipelined
+    delivery logic.
+
+- Arch-specific bits
+
+  * CPU interrupt mask handling
+
+    The architecture-specific code which manipulates the interrupt
+    flag in the CPU's state register
+    (i.e. arch/<arch>/include/asm/irqflags.h) is split between real
+    and virtual interrupt control:
+
+    + the *hard_local_irq* level helpers affect the hardware state in
+      the CPU.
+
+    + the *arch_* level helpers affect the virtual interrupt flag_
+      implemented by the pipeline core for controlling the root stage
+      protection against interrupts.
+
+    This means that generic helpers from <linux/irqflags.h> such as
+    :c:func:`local_irq_disable` and :c:func:`local_irq_enable`
+    actually refer to the virtual protection scheme when interrupts
+    are pipelined, implementing interrupt deferral_ for the protected
+    in-band code running over the root stage.
+
+  * Assembly-level IRQ, exception paths
+
+    Since interrupts are only virtually masked by the in-band code,
+    IRQs can still be taken by the CPU although they should not be
+    visible from the root stage when they happen in the following
+    situations:
+
+    + when the virtual protection flag_ is raised, meaning the root
+      stage does not accept IRQs, in which case interrupt _deferral
+      happens.
+
+    + when the CPU runs out-of-band code, regardless of the state of
+      the virtual protection flag.
+
+    In both cases, the low-level assembly code handling incoming IRQs
+    takes a fast exit path unwinding the interrupt frame early,
+    instead of running the common in-band epilogue which checks for
+    task rescheduling opportunities and pending signals.
+
+    Likewise, the low-level fault/exception handling code also takes a
+    fast exit path under the same circumstances. Typically, an
+    out-of-band handler causing a minor page fault should benefit from
+    a lightweight PTE fixup performed by the high-level fault handler,
+    but is not allowed to traverse the rescheduling logic upon return
+    from exception.
+
+- Scheduler core
+
+  * CPUIDLE support
+
+    The logic of the CPUIDLE framework has to account for those
+    specific issues the interrupt pipelining introduces:
+
+    - the kernel might be idle in the sense that no in-band activity
+    is scheduled yet, and planning to shut down the timer device
+    suffering the C3STOP (mis)feature.  However, at the same time,
+    some out-of-band code might wait for a tick event already
+    programmed in the timer hardware controlled by some out-of-band
+    code via the timer_ interposition mechanism.
+
+    - switching the CPU to a power saving state may incur a
+    significant latency, particularly for waking it up before it can
+    handle an incoming IRQ, which is at odds with the purpose of
+    interrupt pipelining.
+
+    Obviously, we don't want the CPUIDLE logic to turn off the
+    hardware timer when C3STOP is in effect for the timer device,
+    which would cause the pending out-of-band event to be
+    lost.
+
+    Likewise, the wake up latency induced by entering a sleep state on
+    a particular hardware may not always be acceptable.
+
+    Since the in-band kernel code does not know about the out-of-band
+    code plans by design, CPUIDLE calls :c:func:`ipipe_cpuidle_control`
+    to figure out whether the out-of-band system is fine with entering
+    the idle state as well.  This routine should be overriden by the
+    out-of-band code for receiving such notification (*__weak*
+    binding).
+
+    If this hook returns a boolean *true* value, CPUIDLE proceeds as
+    normally. Otherwise, the CPU is simply denied from entering the
+    idle state, leaving the timer hardware enabled.
+
+    ..CAUTION:: If some out-of-band code waiting for an external event
+    cannot bear with the latency that might be induced by the default
+    architecture-specific CPU idling code, then CPUIDLE is not usable
+    and should be disabled at build time.
+
+  * Kernel preemption control (PREEMPT)
+
+    :c:func:`__preempt_schedule_irq` reconciles the virtual interrupt
+    state - which has not been touched by the assembly level code upon
+    kernel entry - with basic assumptions made by the scheduler core,
+    such as entering with interrupts disabled. It should be called by
+    the arch-specific assembly code in replacement of
+    :c:func:`preempt_schedule_irq`, from the call site dealing with
+    kernel preemption upon return from IRQ or system call.
+
+- Timer management
+
+  * Timer interposition
+
+.. _timer:
+    The timer interposition mechanism is designed for handing over
+    control of the hardware tick device in use by the kernel to an
+    out-of-band timing logic. Typically, a real-time co-kernel would
+    make good use of this feature, for grabbing control over the timer
+    hardware.
+
+    Once some out-of-band logic has grabbed control over the timer
+    device by calling :c:func:`ipipe_select_timers`, it can install
+    its own out-of-band handlers using :c:func:`ipipe_timer_start`.
+    From that point, it must carry out the timing requests from the
+    in-band timer core (e.g. hrtimers) in addition to its own timing
+    duties.
+
+    In other words, once the interposition is set up, the
+    functionality of the tick device is shared between the in-band and
+    out-of-band contexts, with only the latter actually programming
+    the hardware.
+
+    This mechanism is based on the clock event abstraction (`struct
+    clock_event_device`). Clock event devices which may be controlled
+    by this way need their drivers to be specifically adapted for such
+    use:
+
+    + the interrupt handler receiving tick IRQs must be check with
+    :c:func:`clockevent_ipipe_stolen` whether they actually control
+    the hardware. A non-zero return from this routine means that it
+    does not, and therefore should skip the timer acknowledge
+    code, which would have run earlier in that case.
+
+- Generic locking & atomic
+
+  * Generic atomic ops
+
+.. _atomic:
+    The effect of virtualizing interrupt protection must be reversed
+    for atomic helpers in <asm-generic/{atomic|bitops/atomic}.h> and
+    <asm-generic/cmpxchg-local.h>, so that no interrupt can preempt
+    their execution, regardless of the stage their caller live
+    on.
+
+    This is required to keep those helpers usable on data which
+    might be accessed concurrently from both stages.
+
+    The usual way to revert such virtualization consists of delimiting
+    the protected section with :c:func:`hard_local_irq_save`,
+    :c:func:`hard_local_irq_restore` calls, in replacement for
+    :c:func:`local_irq_save`, :c:func:`local_irq_restore`
+    respectively.
+
+  * Hard spinlocks
+
+    The pipeline core introduces one more spinlock type:
+
+    + *hard* spinlocks manipulate the CPU interrupt mask, and don't
+      affect the kernel preemption state in locking/unlocking
+      operations.
+
+      This type of spinlock is useful for implementing a critical
+      section to serialize concurrent accesses from both in-band and
+      out-of-band contexts, i.e. from root and head stages. Obviously,
+      sleeping into a critical section protected by a hard spinlock
+      would be a very bad idea.
+
+      In other words, hard spinlocks are not subject to virtual
+      interrupt masking, therefore can be used to serialize with
+      out-of-band activities, including from the in-band kernel
+      code. At any rate, those sections ought to be quite short, for
+      keeping latency low.
+
+- Drivers
+
+  * IRQ chip drivers
+
+    .. _irqchip:
+    irqchip drivers need to be specifically adapted for supporting the
+    pipelined interrupt model. The irqchip descriptor gains additional
+    handlers:
+
+    + irq_chip.irq_hold is an optional handler called by the pipeline
+      core upon events from *level-triggered*, *fasteoi* and *percpu*
+      types. See Holding_ interrupt lines.
+
+      When specified in the descriptor, irq_chip.irq_hold should
+      perform as follows, depending on the hardware acknowledge logic:
+
+          + level   ->  mask[+ack]
+          + percpu  ->  mask[+ack][+eoi]
+          + fasteoi ->  mask+eoi
+
+      .. CAUTION:: proper acknowledge and/or EOI is important when
+                   holding a line, as those operations may also
+                   decrease the current interrupt priority level for
+                   the CPU, allowing same or lower priority
+                   out-of-band interrupts to be taken while the
+                   initial IRQ might be deferred_ for the root stage.
+
+    + irq_chip.irq_release is the converse operation to
+      irq_chip.irq_hold, releasing an interrupt line from the held
+      state.
+
+      The :c:func:`ipipe_end_irq` routine invokes the available
+      handler for releasing the interrupt line. The pipeline core
+      calls :c:func:`irq_release` automatically for each IRQ which has
+      been accepted by an in-band handler (`IRQ_HANDLED` status). This
+      routine should be called explicitly by out-of-band handlers
+      before returning to their caller.
+
+    `IRQCHIP_PIPELINE_SAFE` must be added to `struct irqchip::flags`
+    member of a pipeline-aware irqchip driver.
+
+    .. NOTE:: :c:func:`irq_set_chip` will complain loudly with a
+              kernel warning whenever the irqchip descriptor passed
+              does not bear the `IRQCHIP_PIPELINE_SAFE` flag and
+              CONFIG_IPIPE is enabled.
+
+- Misc
+
+  * :c:func:`printk`
+
+    :c:func:`printk` may be called by out-of-band code safely, without
+    encurring extra latency. The output is delayed until the in-band
+    code resumes, and the console driver(s) can handle it.
+
+  * Tracing core
+
+    Tracepoints can be traversed by out-of-band code safely. Dynamic
+    tracing is available to a kernel running the pipelined interrupt
+    model too.
+
+Terminology
+===========
+
+.. _terminology:
+======================   =======================================================
+    Term                                       Definition
+======================   =======================================================
+Head stage               high-priority execution context trigged by out-of-band IRQs
+Root stage               regular kernel context performing GPOS work
+Out-of-band code         code running over the head stage
+In-band code             code running over the root stage
+Scheduler                the regular, Linux kernel scheduler
+Real-time scheduler      the out-of-band task scheduling logic implemented on top of the I-pipe
+
+Resources
+=========
+
+.. [#f1] Stodolsky, Chen & Bershad; "Fast Interrupt Priority Management in Operating System Kernels"
+    https://www.usenix.org/legacy/publications/library/proceedings/micro93/full_papers/stodolsky.txt
+.. [#f2] Yaghmour, Karim; "ADEOS - Adaptive Domain Environment for Operating Systems"
+    https://www.opersys.com/ftp/pub/Adeos/adeos.pdf
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8ef85139553f..3a2c0255fd94 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -132,8 +132,8 @@ config X86
 	select HAVE_ALIGNED_STRUCT_PAGE		if SLUB
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_HUGE_VMAP		if X86_64 || X86_PAE
-	select HAVE_ARCH_JUMP_LABEL
-	select HAVE_ARCH_JUMP_LABEL_RELATIVE
+	select HAVE_ARCH_JUMP_LABEL		if !IPIPE
+	select HAVE_ARCH_JUMP_LABEL_RELATIVE	if !IPIPE
 	select HAVE_ARCH_KASAN			if X86_64
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_MMAP_RND_BITS		if MMU
@@ -151,7 +151,7 @@ config X86
 	select HAVE_ASM_MODVERSIONS
 	select HAVE_CMPXCHG_DOUBLE
 	select HAVE_CMPXCHG_LOCAL
-	select HAVE_CONTEXT_TRACKING		if X86_64
+	select HAVE_CONTEXT_TRACKING		if X86_64 && !IPIPE
 	select HAVE_COPY_THREAD_TLS
 	select HAVE_C_RECORDMCOUNT
 	select HAVE_DEBUG_KMEMLEAK
@@ -173,6 +173,12 @@ config X86
 	select HAVE_IOREMAP_PROT
 	select HAVE_IRQ_EXIT_ON_IRQ_STACK	if X86_64
 	select HAVE_IRQ_TIME_ACCOUNTING
+	select HAVE_IPIPE_SUPPORT		if X86_64
+	select HAVE_IPIPE_TRACER_SUPPORT
+	select IPIPE_HAVE_HOSTRT if IPIPE
+	select IPIPE_HAVE_SAFE_THREAD_INFO if IPIPE
+	select IPIPE_WANT_PTE_PINNING if IPIPE
+	select IPIPE_HAVE_VM_NOTIFIER if IPIPE
 	select HAVE_KERNEL_BZIP2
 	select HAVE_KERNEL_GZIP
 	select HAVE_KERNEL_LZ4
@@ -552,6 +558,7 @@ config X86_UV
 	depends on EFI
 	depends on X86_X2APIC
 	depends on PCI
+	depends on !IPIPE
 	---help---
 	  This option is needed in order to support SGI Ultraviolet systems.
 	  If you don't have one of these, you should say N here.
@@ -758,6 +765,7 @@ if HYPERVISOR_GUEST
 
 config PARAVIRT
 	bool "Enable paravirtualization code"
+	depends on !IPIPE
 	---help---
 	  This changes the kernel so it can modify itself when it is run
 	  under a hypervisor, potentially improving performance significantly
@@ -964,7 +972,7 @@ config CALGARY_IOMMU_ENABLED_BY_DEFAULT
 
 config MAXSMP
 	bool "Enable Maximum number of SMP Processors and NUMA Nodes"
-	depends on X86_64 && SMP && DEBUG_KERNEL
+	depends on X86_64 && SMP && DEBUG_KERNEL && !IPIPE
 	select CPUMASK_OFFSTACK
 	---help---
 	  Enable maximum number of CPUS and NUMA Nodes for this architecture.
@@ -1064,6 +1072,8 @@ config SCHED_MC_PRIO
 
 	  If unsure say Y here.
 
+source "kernel/ipipe/Kconfig"
+
 config UP_LATE_INIT
        def_bool y
        depends on !SMP && X86_LOCAL_APIC
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 3f8e22615812..970819cbaac0 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -17,6 +17,7 @@
 #include <linux/tracehook.h>
 #include <linux/audit.h>
 #include <linux/seccomp.h>
+#include <linux/unistd.h>
 #include <linux/signal.h>
 #include <linux/export.h>
 #include <linux/context_tracking.h>
@@ -48,6 +49,22 @@ __visible inline void enter_from_user_mode(void)
 static inline void enter_from_user_mode(void) {}
 #endif
 
+#ifdef CONFIG_IPIPE
+#define disable_local_irqs()	do {	\
+	hard_local_irq_disable();	\
+	trace_hardirqs_off();		\
+} while (0)
+#define enable_local_irqs()	do {	\
+	trace_hardirqs_on();		\
+	hard_local_irq_enable();	\
+} while (0)
+#define check_irqs_disabled()	hard_irqs_disabled()
+#else
+#define disable_local_irqs()	local_irq_disable()
+#define enable_local_irqs()	local_irq_enable()
+#define check_irqs_disabled()	irqs_disabled()
+#endif
+
 static void do_audit_syscall_entry(struct pt_regs *regs, u32 arch)
 {
 #ifdef CONFIG_X86_64
@@ -143,7 +160,7 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
 	 */
 	while (true) {
 		/* We have work to do. */
-		local_irq_enable();
+		enable_local_irqs();
 
 		if (cached_flags & _TIF_NEED_RESCHED)
 			schedule();
@@ -168,7 +185,7 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
 			fire_user_return_notifiers();
 
 		/* Disable IRQs and retry */
-		local_irq_disable();
+		disable_local_irqs();
 
 		cached_flags = READ_ONCE(current_thread_info()->flags);
 
@@ -188,11 +205,23 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
 	lockdep_assert_irqs_disabled();
 	lockdep_sys_exit();
 
+again:
 	cached_flags = READ_ONCE(ti->flags);
 
 	if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
 		exit_to_usermode_loop(regs, cached_flags);
 
+	if (ipipe_user_intret_notifier_enabled(ti)) {
+		int ret;
+
+		enable_local_irqs();
+		ret = __ipipe_notify_user_intreturn();
+		disable_local_irqs();
+
+		if (ret == 0)
+			goto again;
+	}
+
 	/* Reload ti->flags; we may have rescheduled above. */
 	cached_flags = READ_ONCE(ti->flags);
 
@@ -258,8 +287,8 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs)
 	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
 
 	if (IS_ENABLED(CONFIG_PROVE_LOCKING) &&
-	    WARN(irqs_disabled(), "syscall %ld left IRQs disabled", regs->orig_ax))
-		local_irq_enable();
+	    WARN(check_irqs_disabled(), "syscall %ld left IRQs disabled", regs->orig_ax))
+		enable_local_irqs();
 
 	rseq_syscall(regs);
 
@@ -267,10 +296,13 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs)
 	 * First do one-time work.  If these work items are enabled, we
 	 * want to run them exactly once per syscall exit with IRQs on.
 	 */
-	if (unlikely(cached_flags & SYSCALL_EXIT_WORK_FLAGS))
+	if (unlikely((!IS_ENABLED(CONFIG_IPIPE) ||
+		      syscall_get_nr(current, regs) <
+				ipipe_root_nr_syscalls(ti)) &&
+		     (cached_flags & SYSCALL_EXIT_WORK_FLAGS)))
 		syscall_slow_exit_work(regs, cached_flags);
 
-	local_irq_disable();
+	disable_local_irqs();
 	prepare_exit_to_usermode(regs);
 }
 
@@ -278,10 +310,23 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs)
 __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
 {
 	struct thread_info *ti;
+	int __maybe_unused ret;
 
 	enter_from_user_mode();
-	local_irq_enable();
+	enable_local_irqs();
 	ti = current_thread_info();
+
+#ifdef CONFIG_IPIPE
+	#define __SYSCALL_MASK (~0)
+	ret = ipipe_handle_syscall(ti, nr & __SYSCALL_MASK, regs);
+	if (ret > 0) {
+		disable_local_irqs();
+		return;
+	}
+	if (ret < 0)
+		goto done;
+#endif
+
 	if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY)
 		nr = syscall_trace_enter(regs);
 
@@ -297,11 +342,47 @@ __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
 #endif
 	}
 
+#ifdef CONFIG_IPIPE
+done:
+#endif
 	syscall_return_slowpath(regs);
 }
 #endif
 
 #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
+
+#ifdef CONFIG_IPIPE
+#ifdef CONFIG_X86_32
+static inline int pipeline_syscall(struct thread_info *ti,
+				   unsigned long nr, struct pt_regs *regs)
+{
+	return ipipe_handle_syscall(ti, nr, regs);
+}
+#else
+static inline int pipeline_syscall(struct thread_info *ti,
+				   unsigned long nr, struct pt_regs *regs)
+{
+	struct pt_regs regs64 = *regs;
+	int ret;
+
+	regs64.di = (unsigned int)regs->bx;
+	regs64.si = (unsigned int)regs->cx;
+	regs64.r10 = (unsigned int)regs->si;
+	regs64.r8 = (unsigned int)regs->di;
+	regs64.r9 = (unsigned int)regs->bp;
+	ret = ipipe_handle_syscall(ti, nr, &regs64);
+	regs->ax = (unsigned int)regs64.ax;
+
+	return ret;
+}
+#endif /* CONFIG_X86_32 */
+#else  /* CONFIG_IPIPE */
+static inline int pipeline_syscall(struct thread_info *ti,
+				   unsigned long nr, struct pt_regs *regs)
+{
+	return 0;
+}
+#endif /* CONFIG_IPIPE */
 /*
  * Does a 32-bit syscall.  Called with IRQs on in CONTEXT_KERNEL.  Does
  * all entry and exit work and returns with IRQs off.  This function is
@@ -312,11 +393,20 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
 {
 	struct thread_info *ti = current_thread_info();
 	unsigned int nr = (unsigned int)regs->orig_ax;
+	int ret;
 
 #ifdef CONFIG_IA32_EMULATION
 	ti->status |= TS_COMPAT;
 #endif
 
+	ret = pipeline_syscall(ti, nr, regs);
+	if (ret > 0) {
+		disable_local_irqs();
+		return;
+	}
+	if (ret < 0)
+		goto done;
+
 	if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) {
 		/*
 		 * Subtlety here: if ptrace pokes something larger than
@@ -344,7 +434,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
 			(unsigned int)regs->di, (unsigned int)regs->bp);
 #endif /* CONFIG_IA32_EMULATION */
 	}
-
+done:
 	syscall_return_slowpath(regs);
 }
 
@@ -352,7 +442,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
 __visible void do_int80_syscall_32(struct pt_regs *regs)
 {
 	enter_from_user_mode();
-	local_irq_enable();
+	enable_local_irqs();
 	do_syscall_32_irqs_on(regs);
 }
 
@@ -376,7 +466,7 @@ __visible long do_fast_syscall_32(struct pt_regs *regs)
 
 	enter_from_user_mode();
 
-	local_irq_enable();
+	enable_local_irqs();
 
 	/* Fetch EBP from where the vDSO stashed it. */
 	if (
@@ -394,7 +484,7 @@ __visible long do_fast_syscall_32(struct pt_regs *regs)
 		) {
 
 		/* User code screwed up. */
-		local_irq_disable();
+		disable_local_irqs();
 		regs->ax = -EFAULT;
 		prepare_exit_to_usermode(regs);
 		return 0;	/* Keep it simple: use IRET. */
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 2ba3d53ac5b1..03bfadbe2fdc 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -30,6 +30,7 @@
 #include <asm/hw_irq.h>
 #include <asm/page_types.h>
 #include <asm/irqflags.h>
+#include <asm/ipipe_base.h>
 #include <asm/paravirt.h>
 #include <asm/percpu.h>
 #include <asm/asm.h>
@@ -63,7 +64,12 @@ END(native_usergs_sysret64)
 .endm
 
 .macro TRACE_IRQS_IRETQ
-	TRACE_IRQS_FLAGS EFLAGS(%rsp)
+#ifdef CONFIG_TRACE_IRQFLAGS
+	btl	$9, EFLAGS(%rsp)	/* interrupts off? */
+	jnc	1f
+	TRACE_IRQS_ON_VIRT
+1:
+#endif
 .endm
 
 /*
@@ -77,7 +83,8 @@ END(native_usergs_sysret64)
  * make sure the stack pointer does not get reset back to the top
  * of the debug stack, and instead just reuses the current stack.
  */
-#if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_TRACE_IRQFLAGS)
+#if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_TRACE_IRQFLAGS) \
+	&& !defined(CONFIG_IPIPE)
 
 .macro TRACE_IRQS_OFF_DEBUG
 	call	debug_stack_set_zero
@@ -332,6 +339,7 @@ END(__switch_to_asm)
  */
 ENTRY(ret_from_fork)
 	UNWIND_HINT_EMPTY
+	HARD_COND_ENABLE_INTERRUPTS
 	movq	%rax, %rdi
 	call	schedule_tail			/* rdi: 'prev' task parameter */
 
@@ -575,8 +583,13 @@ ENTRY(interrupt_entry)
 
 1:
 	ENTER_IRQ_STACK old_rsp=%rdi save_ret=1
-	/* We entered an interrupt context - irqs are off: */
+#ifndef CONFIG_IPIPE
+	/* We entered an interrupt context - irqs are off unless
+	   pipelining is enabled, in which case we defer tracing until
+	   __ipipe_do_sync_stage() where the virtual IRQ state is
+	   updated for the root stage. */
 	TRACE_IRQS_OFF
+#endif
 
 	ret
 END(interrupt_entry)
@@ -604,7 +617,17 @@ common_interrupt:
 	addq	$-0x80, (%rsp)			/* Adjust vector to [-256, -1] range */
 	call	interrupt_entry
 	UNWIND_HINT_REGS indirect=1
+#ifdef CONFIG_IPIPE
+	call	__ipipe_handle_irq
+	testl	%eax, %eax
+	jnz	ret_from_intr
+	LEAVE_IRQ_STACK
+	testb	$3, CS(%rsp)
+	jz	retint_kernel_early
+	jmp	retint_user_early
+#else
 	call	do_IRQ	/* rdi points to pt_regs */
+#endif
 	/* 0(%rsp): old RSP */
 ret_from_intr:
 	DISABLE_INTERRUPTS(CLBR_ANY)
@@ -619,6 +642,7 @@ ret_from_intr:
 GLOBAL(retint_user)
 	mov	%rsp,%rdi
 	call	prepare_exit_to_usermode
+retint_user_early:
 	TRACE_IRQS_IRETQ
 
 GLOBAL(swapgs_restore_regs_and_return_to_usermode)
@@ -672,12 +696,17 @@ retint_kernel:
 	jnc	1f
 	cmpl	$0, PER_CPU_VAR(__preempt_count)
 	jnz	1f
+#ifdef CONFIG_IPIPE
+	call	__ipipe_preempt_schedule_irq
+#else
 	call	preempt_schedule_irq
+#endif
 1:
 #endif
 	/*
 	 * The iretq could re-enable interrupts:
 	 */
+	retint_kernel_early:
 	TRACE_IRQS_IRETQ
 
 GLOBAL(restore_regs_and_return_to_kernel)
@@ -796,6 +825,28 @@ _ASM_NOKPROBE(common_interrupt)
 /*
  * APIC interrupts.
  */
+#ifdef CONFIG_IPIPE
+.macro apicinterrupt2 num sym
+ENTRY(\sym)
+	UNWIND_HINT_IRET_REGS
+	ASM_CLAC
+	pushq	$~(\num)
+.Lcommon_\sym:
+	call	interrupt_entry
+	UNWIND_HINT_REGS indirect=1
+	call	__ipipe_handle_irq
+	testl	%eax, %eax
+	jnz	ret_from_intr
+	LEAVE_IRQ_STACK
+	testb	$3, CS(%rsp)
+	jz	retint_kernel_early
+	jmp	retint_user_early
+END(\sym)
+.endm
+.macro apicinterrupt3 num sym do_sym
+apicinterrupt2 \num \sym
+.endm
+#else /* !CONFIG_IPIPE */
 .macro apicinterrupt3 num sym do_sym
 ENTRY(\sym)
 	UNWIND_HINT_IRET_REGS
@@ -808,6 +859,7 @@ ENTRY(\sym)
 END(\sym)
 _ASM_NOKPROBE(\sym)
 .endm
+#endif /* !CONFIG_IPIPE */
 
 /* Make sure APIC interrupt handlers end up in the irqentry section: */
 #define PUSH_SECTION_IRQENTRY	.pushsection .irqentry.text, "ax"
@@ -853,6 +905,14 @@ apicinterrupt THERMAL_APIC_VECTOR		thermal_interrupt		smp_thermal_interrupt
 apicinterrupt CALL_FUNCTION_SINGLE_VECTOR	call_function_single_interrupt	smp_call_function_single_interrupt
 apicinterrupt CALL_FUNCTION_VECTOR		call_function_interrupt		smp_call_function_interrupt
 apicinterrupt RESCHEDULE_VECTOR			reschedule_interrupt		smp_reschedule_interrupt
+#ifdef CONFIG_IPIPE
+apicinterrupt2 IPIPE_RESCHEDULE_VECTOR		ipipe_reschedule_interrupt
+apicinterrupt2 IPIPE_CRITICAL_VECTOR		ipipe_critical_interrupt
+#endif
+#endif
+
+#ifdef CONFIG_IPIPE
+apicinterrupt2 IPIPE_HRTIMER_VECTOR		ipipe_hrtimer_interrupt
 #endif
 
 apicinterrupt ERROR_APIC_VECTOR			error_interrupt			smp_error_interrupt
@@ -867,8 +927,51 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work_interrupt		smp_irq_work_interrupt
  */
 #define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + (x) * 8)
 
-.macro idtentry_part do_sym, has_error_code:req, read_cr2:req, paranoid:req, shift_ist=-1, ist_offset=0
 
+/*
+ * occupy r13,r14,r15. r12 used for prevent clobbering of saved CR2 value.
+ */
+.macro ipipe_idtentry_prologue paranoid=0 trapnr=-1 skip_label=-invalid-
+#ifdef CONFIG_IPIPE
+	movq    EFLAGS(%rsp), %r14		/* regs->flags */
+	movq    %rsp, %rdi			/* pt_regs pointer */
+	movl    $\trapnr, %esi			/* trap number */
+	subq    $8, %rsp
+	movq    %rsp, %rdx			/* &flags */
+	call    __ipipe_trap_prologue
+	popq    %r13
+	mov     %rax, %r15			/* save propagation status */
+	.if \paranoid == 0			/* paranoid may not skip handler */
+	testl   %eax, %eax
+	jg      \skip_label			/* skip regular handler if > 0 */
+	.endif
+#endif
+.endm
+
+.macro ipipe_idtentry_epilogue paranoid=0 skip_label=-invalid-
+#ifdef CONFIG_IPIPE
+	testl   %r15d, %r15d
+	jnz     1000f
+	movq    %rsp, %rdi			/* pt_regs pointer */
+	movq    %r13, %rsi			/* &flags from prologue */
+	movq    %r14, %rdx			/* original regs->flags before fixup */
+	call    __ipipe_trap_epilogue
+1000:
+	.if \paranoid == 0			/* paranoid implies normal epilogue */
+	testl   %r15d, %r15d
+	jz      1001f
+\skip_label:
+	UNWIND_HINT_REGS
+	DISABLE_INTERRUPTS(CLBR_ANY)
+	testb   $3, CS(%rsp)
+	jz      retint_kernel_early
+	jmp     retint_user_early
+	.endif
+1001:
+#endif
+.endm
+
+.macro idtentry_part do_sym, has_error_code:req, read_cr2:req, trapnr:req, paranoid:req, shift_ist=-1, ist_offset=0
 	.if \paranoid
 	call	paranoid_entry
 	/* returned flag: ebx=0: need swapgs on exit, ebx=1: don't need it */
@@ -899,6 +1002,8 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work_interrupt		smp_irq_work_interrupt
 .Lfrom_kernel_no_context_tracking_\@:
 	.endif
 
+	ipipe_idtentry_prologue paranoid=\paranoid trapnr=\trapnr skip_label=kernel_skip_\@
+
 	movq	%rsp, %rdi			/* pt_regs pointer */
 
 	.if \has_error_code
@@ -918,6 +1023,8 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work_interrupt		smp_irq_work_interrupt
 
 	call	\do_sym
 
+	ipipe_idtentry_epilogue paranoid=\paranoid skip_label=kernel_skip_\@
+
 	.if \shift_ist != -1
 	addq	$\ist_offset, CPU_TSS_IST(\shift_ist)
 	.endif
@@ -969,7 +1076,7 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work_interrupt		smp_irq_work_interrupt
  * @paranoid == 2 is special: the stub will never switch stacks.  This is for
  * #DF: if the thread stack is somehow unusable, we'll still get a useful OOPS.
  */
-.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ist_offset=0 create_gap=0 read_cr2=0
+.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ist_offset=0 create_gap=0 read_cr2=0 trapnr=-1
 ENTRY(\sym)
 	UNWIND_HINT_IRET_REGS offset=\has_error_code*8
 
@@ -1007,7 +1114,7 @@ ENTRY(\sym)
 .Lfrom_usermode_no_gap_\@:
 	.endif
 
-	idtentry_part \do_sym, \has_error_code, \read_cr2, \paranoid, \shift_ist, \ist_offset
+	idtentry_part \do_sym, \has_error_code, \read_cr2, \trapnr, \paranoid, \shift_ist, \ist_offset
 
 	.if \paranoid == 1
 	/*
@@ -1016,26 +1123,26 @@ ENTRY(\sym)
 	 * run in real process context if user_mode(regs).
 	 */
 .Lfrom_usermode_switch_stack_\@:
-	idtentry_part \do_sym, \has_error_code, \read_cr2, paranoid=0
+	idtentry_part \do_sym, \has_error_code, \read_cr2, \trapnr, paranoid=0
 	.endif
 
 _ASM_NOKPROBE(\sym)
 END(\sym)
 .endm
 
-idtentry divide_error			do_divide_error			has_error_code=0
-idtentry overflow			do_overflow			has_error_code=0
-idtentry bounds				do_bounds			has_error_code=0
-idtentry invalid_op			do_invalid_op			has_error_code=0
-idtentry device_not_available		do_device_not_available		has_error_code=0
-idtentry double_fault			do_double_fault			has_error_code=1 paranoid=2 read_cr2=1
-idtentry coprocessor_segment_overrun	do_coprocessor_segment_overrun	has_error_code=0
-idtentry invalid_TSS			do_invalid_TSS			has_error_code=1
-idtentry segment_not_present		do_segment_not_present		has_error_code=1
-idtentry spurious_interrupt_bug		do_spurious_interrupt_bug	has_error_code=0
-idtentry coprocessor_error		do_coprocessor_error		has_error_code=0
-idtentry alignment_check		do_alignment_check		has_error_code=1
-idtentry simd_coprocessor_error		do_simd_coprocessor_error	has_error_code=0
+idtentry divide_error			do_divide_error			has_error_code=0 trapnr=0
+idtentry overflow			do_overflow			has_error_code=0 trapnr=4
+idtentry bounds				do_bounds			has_error_code=0 trapnr=5
+idtentry invalid_op			do_invalid_op			has_error_code=0 trapnr=6
+idtentry device_not_available		do_device_not_available		has_error_code=0 trapnr=7
+idtentry double_fault			do_double_fault			has_error_code=1 paranoid=2 read_cr2=1 trapnr=8
+idtentry coprocessor_segment_overrun	do_coprocessor_segment_overrun	has_error_code=0 trapnr=9
+idtentry invalid_TSS			do_invalid_TSS			has_error_code=1 trapnr=10
+idtentry segment_not_present		do_segment_not_present		has_error_code=1 trapnr=11
+idtentry spurious_interrupt_bug		do_spurious_interrupt_bug	has_error_code=0 trapnr=15
+idtentry coprocessor_error		do_coprocessor_error		has_error_code=0 trapnr=16
+idtentry alignment_check		do_alignment_check		has_error_code=1 trapnr=17
+idtentry simd_coprocessor_error		do_simd_coprocessor_error	has_error_code=0 trapnr=19
 
 
 	/*
@@ -1079,10 +1186,14 @@ EXPORT_SYMBOL(native_load_gs_index)
 ENTRY(do_softirq_own_stack)
 	pushq	%rbp
 	mov	%rsp, %rbp
+	HARD_COND_DISABLE_INTERRUPTS
 	ENTER_IRQ_STACK regs=0 old_rsp=%r11
+	HARD_COND_ENABLE_INTERRUPTS
 	call	__do_softirq
+	HARD_COND_DISABLE_INTERRUPTS
 	LEAVE_IRQ_STACK regs=0
 	leaveq
+	HARD_COND_ENABLE_INTERRUPTS
 	ret
 ENDPROC(do_softirq_own_stack)
 
@@ -1190,24 +1301,28 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
 	acrn_hv_callback_vector acrn_hv_vector_handler
 #endif
 
+#ifdef CONFIG_IPIPE
+idtentry debug                  do_debug                has_error_code=0        paranoid=1 trapnr=1
+#else
 idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB ist_offset=DB_STACK_OFFSET
-idtentry int3			do_int3			has_error_code=0	create_gap=1
-idtentry stack_segment		do_stack_segment	has_error_code=1
+#endif
+idtentry int3			do_int3			has_error_code=0	create_gap=1 trapnr=3
+idtentry stack_segment		do_stack_segment	has_error_code=1	trapnr=12
 
 #ifdef CONFIG_XEN_PV
 idtentry xennmi			do_nmi			has_error_code=0
 idtentry xendebug		do_debug		has_error_code=0
 #endif
 
-idtentry general_protection	do_general_protection	has_error_code=1
-idtentry page_fault		do_page_fault		has_error_code=1	read_cr2=1
+idtentry general_protection	do_general_protection	has_error_code=1	trapnr=13
+idtentry page_fault		do_page_fault		has_error_code=1	read_cr2=1 trapnr=14
 
 #ifdef CONFIG_KVM_GUEST
-idtentry async_page_fault	do_async_page_fault	has_error_code=1	read_cr2=1
+idtentry async_page_fault	do_async_page_fault	has_error_code=1	read_cr2=1 trapnr=14
 #endif
 
 #ifdef CONFIG_X86_MCE
-idtentry machine_check		do_mce			has_error_code=0	paranoid=1
+idtentry machine_check		do_mce			has_error_code=0	paranoid=1 trapnr=18
 #endif
 
 /*
diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk_64.S
index ea5c4167086c..e48c14d34d73 100644
--- a/arch/x86/entry/thunk_64.S
+++ b/arch/x86/entry/thunk_64.S
@@ -39,6 +39,7 @@
 
 #ifdef CONFIG_TRACE_IRQFLAGS
 	THUNK trace_hardirqs_on_thunk,trace_hardirqs_on_caller,1
+	THUNK trace_hardirqs_on_virt_thunk,trace_hardirqs_on_virt_caller,1
 	THUNK trace_hardirqs_off_thunk,trace_hardirqs_off_caller,1
 #endif
 
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 19e94af9cc5d..d5210b0476a9 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -445,7 +445,17 @@ static inline void apic_set_eoi_write(void (*eoi_write)(u32 reg, u32 v)) {}
 
 extern void apic_ack_irq(struct irq_data *data);
 
+#ifdef CONFIG_IPIPE
+#ifdef CONFIG_SMP
+struct irq_data;
+void move_xxapic_irq(struct irq_data *data);
+#endif
+#define ack_APIC_irq() do { } while(0)
+static inline void __ack_APIC_irq(void)
+#else /* !CONFIG_IPIPE */
+#define __ack_APIC_irq() ack_APIC_irq()
 static inline void ack_APIC_irq(void)
+#endif /* CONFIG_IPIPE */
 {
 	/*
 	 * ack_APIC_irq() actually gets compiled as a single instruction
diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h
index 1a8609a15856..3baafeb2dfdc 100644
--- a/arch/x86/include/asm/debugreg.h
+++ b/arch/x86/include/asm/debugreg.h
@@ -94,7 +94,7 @@ extern void aout_dump_debugregs(struct user *dump);
 
 extern void hw_breakpoint_restore(void);
 
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_IPIPE)
 DECLARE_PER_CPU(int, debug_stack_usage);
 static inline void debug_stack_usage_inc(void)
 {
diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 68a99d2a5f33..6ffdc05ca930 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -309,7 +309,7 @@ static inline void force_reload_TR(void)
  */
 static inline void refresh_tss_limit(void)
 {
-	DEBUG_LOCKS_WARN_ON(preemptible());
+	DEBUG_LOCKS_WARN_ON(!hard_irqs_disabled() && preemptible());
 
 	if (unlikely(this_cpu_read(__tss_limit_invalid)))
 		force_reload_TR();
@@ -326,7 +326,7 @@ static inline void refresh_tss_limit(void)
  */
 static inline void invalidate_tss_limit(void)
 {
-	DEBUG_LOCKS_WARN_ON(preemptible());
+	DEBUG_LOCKS_WARN_ON(!hard_irqs_disabled() && preemptible());
 
 	if (unlikely(test_thread_flag(TIF_IO_BITMAP)))
 		force_reload_TR();
@@ -391,7 +391,7 @@ void alloc_intr_gate(unsigned int n, const void *addr);
 
 extern unsigned long system_vectors[];
 
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_IPIPE)
 DECLARE_PER_CPU(u32, debug_idt_ctr);
 static inline bool is_debug_idt_enabled(void)
 {
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index b774c52e5411..9ca9f4e4f37e 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -11,6 +11,7 @@
 #ifndef _ASM_X86_FPU_API_H
 #define _ASM_X86_FPU_API_H
 #include <linux/bottom_half.h>
+#include <linux/irqflags.h>
 
 /*
  * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
@@ -30,16 +31,25 @@ extern void fpregs_mark_activate(void);
  * fpu->state and set TIF_NEED_FPU_LOAD leaving CPU's FPU registers in
  * a random state.
  */
-static inline void fpregs_lock(void)
+static inline unsigned long fpregs_lock(void)
 {
+#ifdef CONFIG_IPIPE
+	return hard_local_irq_save();
+#else
 	preempt_disable();
 	local_bh_disable();
+	return 0;
+#endif
 }
 
-static inline void fpregs_unlock(void)
+static inline void fpregs_unlock(unsigned long flags)
 {
+#ifdef CONFIG_IPIPE
+	hard_local_irq_restore(flags);
+#else
 	local_bh_enable();
 	preempt_enable();
+#endif
 }
 
 #ifdef CONFIG_X86_DEBUG_FPU
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 00eac7f1529b..1e1dd9523dd3 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -25,6 +25,7 @@
 /*
  * High level FPU state handling functions:
  */
+extern void fpu__initialize(struct fpu *fpu);
 extern void fpu__prepare_read(struct fpu *fpu);
 extern void fpu__prepare_write(struct fpu *fpu);
 extern void fpu__save(struct fpu *fpu);
@@ -647,4 +648,24 @@ static inline void xsetbv(u32 index, u64 value)
 		     : : "a" (eax), "d" (edx), "c" (index));
 }
 
+DECLARE_PER_CPU(bool, in_kernel_fpu);
+
+static inline void kernel_fpu_disable(void)
+{
+	WARN_ON_FPU(this_cpu_read(in_kernel_fpu));
+	this_cpu_write(in_kernel_fpu, true);
+}
+
+static inline void kernel_fpu_enable(void)
+{
+	WARN_ON_FPU(!this_cpu_read(in_kernel_fpu));
+	this_cpu_write(in_kernel_fpu, false);
+}
+
+static inline bool kernel_fpu_disabled(void)
+{
+	return this_cpu_read(in_kernel_fpu);
+}
+
+
 #endif /* _ASM_X86_FPU_INTERNAL_H */
diff --git a/arch/x86/include/asm/i8259.h b/arch/x86/include/asm/i8259.h
index 89789e8c80f6..dc376e53aa2a 100644
--- a/arch/x86/include/asm/i8259.h
+++ b/arch/x86/include/asm/i8259.h
@@ -26,7 +26,7 @@ extern unsigned int cached_irq_mask;
 #define SLAVE_ICW4_DEFAULT	0x01
 #define PIC_ICW4_AEOI		2
 
-extern raw_spinlock_t i8259A_lock;
+IPIPE_DECLARE_RAW_SPINLOCK(i8259A_lock);
 
 /* the PIC may need a careful delay on some platforms, hence specific calls */
 static inline unsigned char inb_pic(unsigned int port)
diff --git a/arch/x86/include/asm/ipipe.h b/arch/x86/include/asm/ipipe.h
new file mode 100644
index 000000000000..8d8db22ab138
--- /dev/null
+++ b/arch/x86/include/asm/ipipe.h
@@ -0,0 +1,70 @@
+/*   -*- linux-c -*-
+ *   arch/x86/include/asm/ipipe.h
+ *
+ *   Copyright (C) 2007 Philippe Gerum.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ *   USA; either version 2 of the License, or (at your option) any later
+ *   version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *   GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef __X86_IPIPE_H
+#define __X86_IPIPE_H
+
+#ifdef CONFIG_IPIPE
+
+#define IPIPE_CORE_RELEASE	1
+
+struct ipipe_domain;
+
+struct ipipe_arch_sysinfo {
+};
+
+#define ipipe_processor_id()	raw_smp_processor_id()
+
+/* Private interface -- Internal use only */
+
+#define __ipipe_early_core_setup()	do { } while(0)
+
+#define __ipipe_enable_irq(irq)		irq_to_desc(irq)->chip->enable(irq)
+#define __ipipe_disable_irq(irq)	irq_to_desc(irq)->chip->disable(irq)
+
+#ifdef CONFIG_SMP
+void __ipipe_hook_critical_ipi(struct ipipe_domain *ipd);
+#else
+#define __ipipe_hook_critical_ipi(ipd) do { } while(0)
+#endif
+
+void __ipipe_enable_pipeline(void);
+
+#define __ipipe_root_tick_p(regs)	((regs)->flags & X86_EFLAGS_IF)
+
+#define ipipe_notify_root_preemption()	__ipipe_notify_vm_preemption()
+
+#endif /* CONFIG_IPIPE */
+
+#if defined(CONFIG_SMP) && defined(CONFIG_IPIPE)
+#define __ipipe_move_root_irq(__desc)					\
+	do {								\
+		if (!IS_ERR_OR_NULL(__desc)) {				\
+			struct irq_chip *__chip = irq_desc_get_chip(__desc); \
+			if (__chip->irq_move)				\
+				__chip->irq_move(irq_desc_get_irq_data(__desc)); \
+		}							\
+	} while (0)
+#else /* !(CONFIG_SMP && CONFIG_IPIPE) */
+#define __ipipe_move_root_irq(irq)	do { } while (0)
+#endif /* !(CONFIG_SMP && CONFIG_IPIPE) */
+
+#endif	/* !__X86_IPIPE_H */
diff --git a/arch/x86/include/asm/ipipe_base.h b/arch/x86/include/asm/ipipe_base.h
new file mode 100644
index 000000000000..c09697b1f20c
--- /dev/null
+++ b/arch/x86/include/asm/ipipe_base.h
@@ -0,0 +1,156 @@
+/*   -*- linux-c -*-
+ *   arch/x86/include/asm/ipipe_base.h
+ *
+ *   Copyright (C) 2007-2012 Philippe Gerum.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ *   USA; either version 2 of the License, or (at your option) any later
+ *   version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *   GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef __X86_IPIPE_BASE_H
+#define __X86_IPIPE_BASE_H
+
+#include <asm/irq_vectors.h>
+#include <asm/bitsperlong.h>
+
+#ifdef CONFIG_X86_32
+/* 32 from IDT + iret_error + mayday trap */
+#define IPIPE_TRAP_MAYDAY	33	/* Internal recovery trap */
+#define IPIPE_NR_FAULTS		34
+#else
+/* 32 from IDT + mayday trap */
+#define IPIPE_TRAP_MAYDAY	32	/* Internal recovery trap */
+#define IPIPE_NR_FAULTS		33
+#endif
+
+#ifdef CONFIG_X86_LOCAL_APIC
+/*
+ * Special APIC interrupts are mapped above the last defined external
+ * IRQ number.
+ */
+#define nr_apic_vectors	        (NR_VECTORS - FIRST_SYSTEM_VECTOR)
+#define IPIPE_FIRST_APIC_IRQ	NR_IRQS
+#define IPIPE_HRTIMER_IPI	ipipe_apic_vector_irq(IPIPE_HRTIMER_VECTOR)
+#ifdef CONFIG_SMP
+#define IPIPE_RESCHEDULE_IPI	ipipe_apic_vector_irq(IPIPE_RESCHEDULE_VECTOR)
+#define IPIPE_CRITICAL_IPI	ipipe_apic_vector_irq(IPIPE_CRITICAL_VECTOR)
+#endif /* CONFIG_SMP */
+#define IPIPE_NR_XIRQS		(NR_IRQS + nr_apic_vectors)
+#define ipipe_apic_irq_vector(irq)  ((irq) - IPIPE_FIRST_APIC_IRQ + FIRST_SYSTEM_VECTOR)
+#define ipipe_apic_vector_irq(vec)  ((vec) - FIRST_SYSTEM_VECTOR + IPIPE_FIRST_APIC_IRQ)
+#else
+#define IPIPE_NR_XIRQS		NR_IRQS
+#endif /* !CONFIG_X86_LOCAL_APIC */
+
+#ifndef __ASSEMBLY__
+
+#include <asm/apicdef.h>
+
+extern unsigned int cpu_khz;
+
+static inline const char *ipipe_clock_name(void)
+{
+	return "tsc";
+}
+
+#define __ipipe_cpu_freq	({ u64 __freq = 1000ULL * cpu_khz; __freq; })
+#define __ipipe_hrclock_freq	__ipipe_cpu_freq
+
+#ifdef CONFIG_X86_32
+
+#define ipipe_read_tsc(t)				\
+	__asm__ __volatile__("rdtsc" : "=A"(t))
+
+#define ipipe_tsc2ns(t)					\
+({							\
+	unsigned long long delta = (t) * 1000000ULL;	\
+	unsigned long long freq = __ipipe_hrclock_freq;	\
+	do_div(freq, 1000);				\
+	do_div(delta, (unsigned)freq + 1);		\
+	(unsigned long)delta;				\
+})
+
+#define ipipe_tsc2us(t)					\
+({							\
+	unsigned long long delta = (t) * 1000ULL;	\
+	unsigned long long freq = __ipipe_hrclock_freq;	\
+	do_div(freq, 1000);				\
+	do_div(delta, (unsigned)freq + 1);		\
+	(unsigned long)delta;				\
+})
+
+static inline unsigned long __ipipe_ffnz(unsigned long ul)
+{
+	__asm__("bsrl %1, %0":"=r"(ul) : "r"(ul));
+	return ul;
+}
+
+#else  /* X86_64 */
+
+#define ipipe_read_tsc(t)  do {		\
+	unsigned int __a,__d;			\
+	asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); \
+	(t) = ((unsigned long)__a) | (((unsigned long)__d)<<32); \
+} while(0)
+
+#define ipipe_tsc2ns(t)	(((t) * 1000UL) / (__ipipe_hrclock_freq / 1000000UL))
+#define ipipe_tsc2us(t)	((t) / (__ipipe_hrclock_freq / 1000000UL))
+
+static inline unsigned long __ipipe_ffnz(unsigned long ul)
+{
+      __asm__("bsrq %1, %0":"=r"(ul)
+	      :	"rm"(ul));
+      return ul;
+}
+
+#ifdef CONFIG_IA32_EMULATION
+#define ipipe_root_nr_syscalls(ti)	\
+	((ti->status & TS_COMPAT) ? IA32_NR_syscalls : NR_syscalls)
+#endif /* CONFIG_IA32_EMULATION */
+
+#endif	/* X86_64 */
+
+struct pt_regs;
+struct irq_desc;
+struct ipipe_vm_notifier;
+
+static inline unsigned __ipipe_get_irq_vector(int irq)
+{
+#ifdef CONFIG_X86_IO_APIC
+	unsigned int __ipipe_get_ioapic_irq_vector(int irq);
+	return __ipipe_get_ioapic_irq_vector(irq);
+#elif defined(CONFIG_X86_LOCAL_APIC)
+	return irq >= IPIPE_FIRST_APIC_IRQ ?
+		ipipe_apic_irq_vector(irq) : ISA_IRQ_VECTOR(irq);
+#else
+	return ISA_IRQ_VECTOR(irq);
+#endif
+}
+
+void ipipe_hrtimer_interrupt(void);
+
+void ipipe_reschedule_interrupt(void);
+
+void ipipe_critical_interrupt(void);
+
+int __ipipe_handle_irq(struct pt_regs *regs);
+
+void __ipipe_handle_vm_preemption(struct ipipe_vm_notifier *nfy);
+
+extern int __ipipe_hrtimer_irq;
+
+#endif	/* !__ASSEMBLY__ */
+
+#endif	/* !__X86_IPIPE_BASE_H */
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 889f8b1b5b7f..33fe88ae90be 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -106,13 +106,18 @@
 
 #define LOCAL_TIMER_VECTOR		0xec
 
-#define NR_VECTORS			 256
+/* Interrupt pipeline IPIs */
+#define IPIPE_HRTIMER_VECTOR		0xeb
+#define IPIPE_RESCHEDULE_VECTOR		0xea
+#define IPIPE_CRITICAL_VECTOR		0xe9
 
-#ifdef CONFIG_X86_LOCAL_APIC
-#define FIRST_SYSTEM_VECTOR		LOCAL_TIMER_VECTOR
-#else
-#define FIRST_SYSTEM_VECTOR		NR_VECTORS
-#endif
+/*
+ * I-pipe: Lowest vector number which may be assigned to a special
+ * APIC IRQ. We must know this at build time.
+ */
+#define FIRST_SYSTEM_VECTOR		IPIPE_CRITICAL_VECTOR
+
+#define NR_VECTORS			 256
 
 /*
  * Size the maximum number of interrupts.
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 8a0e56e1dcc9..f696de55e769 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -8,6 +8,10 @@
 
 #include <asm/nospec-branch.h>
 
+#include <linux/ipipe_trace.h>
+#include <linux/compiler.h>
+#include <asm-generic/ipipe.h>
+
 /* Provide __cpuidle; we can't safely include <linux/cpu.h> */
 #define __cpuidle __attribute__((__section__(".cpuidle.text")))
 
@@ -66,14 +70,76 @@ static inline __cpuidle void native_halt(void)
 	asm volatile("hlt": : :"memory");
 }
 
+static inline int native_irqs_disabled(void)
+{
+	unsigned long flags = native_save_fl();
+
+	return !(flags & X86_EFLAGS_IF);
+}
+
 #endif
 
 #ifdef CONFIG_PARAVIRT_XXL
 #include <asm/paravirt.h>
+#define HARD_COND_ENABLE_INTERRUPTS
+#define HARD_COND_DISABLE_INTERRUPTS
 #else
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 
+#ifdef CONFIG_IPIPE
+
+void __ipipe_halt_root(int use_mwait);
+
+static inline notrace unsigned long arch_local_save_flags(void)
+{
+       unsigned long flags;
+
+       flags = (!ipipe_test_root()) << 9;
+       barrier();
+       return flags;
+}
+
+static inline notrace void arch_local_irq_restore(unsigned long flags)
+{
+       barrier();
+       ipipe_restore_root(!(flags & X86_EFLAGS_IF));
+}
+
+static inline notrace void arch_local_irq_disable(void)
+{
+       ipipe_stall_root();
+       barrier();
+}
+
+static inline notrace void arch_local_irq_enable(void)
+{
+       barrier();
+       ipipe_unstall_root();
+}
+
+static inline __cpuidle void arch_safe_halt(void)
+{
+       barrier();
+       __ipipe_halt_root(0);
+}
+
+/* Merge virtual+real interrupt mask bits into a single word. */
+static inline unsigned long arch_mangle_irq_bits(int virt, unsigned long real)
+{
+       return (real & ~(1L << 31)) | ((unsigned long)(virt != 0) << 31);
+}
+
+/* Converse operation of arch_mangle_irq_bits() */
+static inline int arch_demangle_irq_bits(unsigned long *x)
+{
+       int virt = (*x & (1L << 31)) != 0;
+       *x &= ~(1L << 31);
+       return virt;
+}
+
+#else /* !CONFIG_IPIPE */
+
 static inline notrace unsigned long arch_local_save_flags(void)
 {
 	return native_save_fl();
@@ -103,6 +169,8 @@ static inline __cpuidle void arch_safe_halt(void)
 	native_safe_halt();
 }
 
+#endif /* !CONFIG_IPIPE */
+
 /*
  * Used when interrupts are already enabled or to
  * shutdown the processor:
@@ -126,6 +194,14 @@ static inline notrace unsigned long arch_local_irq_save(void)
 #define ENABLE_INTERRUPTS(x)	sti
 #define DISABLE_INTERRUPTS(x)	cli
 
+#ifdef CONFIG_IPIPE
+#define HARD_COND_ENABLE_INTERRUPTS	sti
+#define HARD_COND_DISABLE_INTERRUPTS	cli
+#else /* !CONFIG_IPIPE */
+#define HARD_COND_ENABLE_INTERRUPTS
+#define HARD_COND_DISABLE_INTERRUPTS
+#endif /* !CONFIG_IPIPE */
+
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(x)		pushfq; popq %rax
@@ -170,40 +246,156 @@ static inline int arch_irqs_disabled(void)
 
 	return arch_irqs_disabled_flags(flags);
 }
+
+#ifdef CONFIG_IPIPE
+
+static inline unsigned long hard_local_irq_save_notrace(void)
+{
+	unsigned long flags;
+
+	flags = native_save_fl();
+	native_irq_disable();
+
+	return flags;
+}
+
+static inline void hard_local_irq_restore_notrace(unsigned long flags)
+{
+	native_restore_fl(flags);
+}
+
+static inline void hard_local_irq_disable_notrace(void)
+{
+	native_irq_disable();
+}
+
+static inline void hard_local_irq_enable_notrace(void)
+{
+	native_irq_enable();
+}
+
+static inline int hard_irqs_disabled(void)
+{
+	return native_irqs_disabled();
+}
+
+#define hard_irqs_disabled_flags(flags)	arch_irqs_disabled_flags(flags)
+
+#ifdef CONFIG_IPIPE_TRACE_IRQSOFF
+
+static inline void hard_local_irq_disable(void)
+{
+	if (!native_irqs_disabled()) {
+		native_irq_disable();
+		ipipe_trace_begin(0x80000000);
+	}
+}
+
+static inline void hard_local_irq_enable(void)
+{
+	if (native_irqs_disabled()) {
+		ipipe_trace_end(0x80000000);
+		native_irq_enable();
+	}
+}
+
+static inline unsigned long hard_local_irq_save(void)
+{
+	unsigned long flags;
+
+	flags = native_save_fl();
+	if (flags & X86_EFLAGS_IF) {
+		native_irq_disable();
+		ipipe_trace_begin(0x80000001);
+	}
+
+	return flags;
+}
+
+static inline void hard_local_irq_restore(unsigned long flags)
+{
+	if (flags & X86_EFLAGS_IF)
+		ipipe_trace_end(0x80000001);
+
+	native_restore_fl(flags);
+}
+
+#else /* !CONFIG_IPIPE_TRACE_IRQSOFF */
+
+static inline unsigned long hard_local_irq_save(void)
+{
+	return hard_local_irq_save_notrace();
+}
+
+static inline void hard_local_irq_restore(unsigned long flags)
+{
+	hard_local_irq_restore_notrace(flags);
+}
+
+static inline void hard_local_irq_enable(void)
+{
+	hard_local_irq_enable_notrace();
+}
+
+static inline void hard_local_irq_disable(void)
+{
+	hard_local_irq_disable_notrace();
+}
+
+#endif /* CONFIG_IPIPE_TRACE_IRQSOFF */
+
+static inline unsigned long hard_local_save_flags(void)
+{
+	return native_save_fl();
+}
+
+#endif /* CONFIG_IPIPE */
+
 #endif /* !__ASSEMBLY__ */
 
 #ifdef __ASSEMBLY__
 #ifdef CONFIG_TRACE_IRQFLAGS
 #  define TRACE_IRQS_ON		call trace_hardirqs_on_thunk;
+#ifdef CONFIG_IPIPE
+#  define TRACE_IRQS_ON_VIRT    call trace_hardirqs_on_virt_thunk;
+#else
+#  define TRACE_IRQS_ON_VIRT    TRACE_IRQS_ON
+#endif
 #  define TRACE_IRQS_OFF	call trace_hardirqs_off_thunk;
 #else
 #  define TRACE_IRQS_ON
+#  define TRACE_IRQS_ON_VIRT
 #  define TRACE_IRQS_OFF
 #endif
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 #  ifdef CONFIG_X86_64
-#    define LOCKDEP_SYS_EXIT		call lockdep_sys_exit_thunk
+#    define LOCKDEP_SYS_EXIT	call lockdep_sys_exit_thunk
 #    define LOCKDEP_SYS_EXIT_IRQ \
 	TRACE_IRQS_ON; \
 	sti; \
 	call lockdep_sys_exit_thunk; \
 	cli; \
 	TRACE_IRQS_OFF;
+
 #  else
-#    define LOCKDEP_SYS_EXIT \
+#    define LOCKDEP_SYS_EXIT			\
 	pushl %eax;				\
 	pushl %ecx;				\
 	pushl %edx;				\
+	pushfl;					\
+	sti;					\
 	call lockdep_sys_exit;			\
+	popfl;					\
 	popl %edx;				\
 	popl %ecx;				\
 	popl %eax;
+
 #    define LOCKDEP_SYS_EXIT_IRQ
 #  endif
 #else
 #  define LOCKDEP_SYS_EXIT
 #  define LOCKDEP_SYS_EXIT_IRQ
 #endif
-#endif /* __ASSEMBLY__ */
 
+#endif /* __ASSEMBLY__ */
 #endif
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 16ae821483c8..0d9dd08c2122 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -174,7 +174,8 @@ static inline void switch_ldt(struct mm_struct *prev, struct mm_struct *next)
 		load_mm_ldt(next);
 #endif
 
-	DEBUG_LOCKS_WARN_ON(preemptible());
+	DEBUG_LOCKS_WARN_ON(preemptible() &&
+			(!IS_ENABLED(CONFIG_IPIPE) || !hard_irqs_disabled()));
 }
 
 void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk);
@@ -214,6 +215,9 @@ extern void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			       struct task_struct *tsk);
 #define switch_mm_irqs_off switch_mm_irqs_off
 
+#define ipipe_switch_mm_head(prev, next, tsk) \
+	switch_mm_irqs_off(prev, next, tsk)
+
 #define activate_mm(prev, next)			\
 do {						\
 	paravirt_activate_mm((prev), (next));	\
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index ea85f23d9e22..d8fb1ef5e699 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -136,6 +136,7 @@ static inline u32 read_pkru(void)
 static inline void write_pkru(u32 pkru)
 {
 	struct pkru_state *pk;
+	unsigned long flags;
 
 	if (!boot_cpu_has(X86_FEATURE_OSPKE))
 		return;
@@ -147,11 +148,11 @@ static inline void write_pkru(u32 pkru)
 	 * written to the CPU. The FPU restore on return to userland would
 	 * otherwise load the previous value again.
 	 */
-	fpregs_lock();
+	flags = fpregs_lock();
 	if (pk)
 		pk->pkru = pkru;
 	__write_pkru(pkru);
-	fpregs_unlock();
+	fpregs_unlock(flags);
 }
 
 static inline int pte_young(pte_t pte)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index f9453536f9bb..21fb6384ff0f 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -52,10 +52,15 @@
 struct task_struct;
 #include <asm/cpufeature.h>
 #include <linux/atomic.h>
+#include <ipipe/thread_info.h>
 
 struct thread_info {
 	unsigned long		flags;		/* low level flags */
 	u32			status;		/* thread synchronous flags */
+#ifdef CONFIG_IPIPE
+	unsigned long		ipipe_flags;
+	struct ipipe_threadinfo ipipe_data;
+#endif
 };
 
 #define INIT_THREAD_INFO(tsk)			\
@@ -159,6 +164,17 @@ struct thread_info {
 #define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY)
 #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
 
+/* ti->ipipe_flags */
+#define TIP_HEAD	0	/* Runs in head domain */
+#define TIP_NOTIFY	1	/* Notify head domain about kernel events */
+#define TIP_MAYDAY	2	/* MAYDAY call is pending */
+#define TIP_USERINTRET	3	/* Notify on IRQ/trap return to root userspace */
+
+#define _TIP_HEAD	(1 << TIP_HEAD)
+#define _TIP_NOTIFY	(1 << TIP_NOTIFY)
+#define _TIP_MAYDAY	(1 << TIP_MAYDAY)
+#define _TIP_USERINTRET	(1 << TIP_USERINTRET)
+
 #define STACK_WARN		(THREAD_SIZE/8)
 
 /*
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 8a0c25c6bf09..3d13c4477887 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -15,6 +15,9 @@
  */
 typedef unsigned long long cycles_t;
 
+extern struct clocksource clocksource_tsc;
+#define __ipipe_hostrt_clock	clocksource_tsc
+
 extern unsigned int cpu_khz;
 extern unsigned int tsc_khz;
 
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 61d93f062a36..7957b55944f8 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -7,6 +7,7 @@
 #include <linux/compiler.h>
 #include <linux/kasan-checks.h>
 #include <linux/string.h>
+#include <linux/ipipe.h>
 #include <asm/asm.h>
 #include <asm/page.h>
 #include <asm/smap.h>
@@ -68,7 +69,7 @@ static inline bool __chk_range_not_ok(unsigned long addr, unsigned long size, un
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 static inline bool pagefault_disabled(void);
 # define WARN_ON_IN_IRQ()	\
-	WARN_ON_ONCE(!in_task() && !pagefault_disabled())
+	WARN_ON_ONCE(ipipe_root_p && !in_task() && !pagefault_disabled())
 #else
 # define WARN_ON_IN_IRQ()
 #endif
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3578ad248bc9..621d855431c7 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -79,6 +79,7 @@ obj-y				+= reboot.o
 obj-$(CONFIG_X86_MSR)		+= msr.o
 obj-$(CONFIG_X86_CPUID)		+= cpuid.o
 obj-$(CONFIG_PCI)		+= early-quirks.o
+obj-$(CONFIG_IPIPE)		+= ipipe.o
 apm-y				:= apm_32.o
 obj-$(CONFIG_APM)		+= apm.o
 obj-$(CONFIG_SMP)		+= smp.o
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index fce94c799f01..1703f542a6ee 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -35,6 +35,7 @@
 #include <linux/dmi.h>
 #include <linux/smp.h>
 #include <linux/mm.h>
+#include <linux/ipipe_tickdev.h>
 
 #include <asm/trace/irq_vectors.h>
 #include <asm/irq_remapping.h>
@@ -271,10 +272,10 @@ void native_apic_icr_write(u32 low, u32 id)
 {
 	unsigned long flags;
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 	apic_write(APIC_ICR2, SET_APIC_DEST_FIELD(id));
 	apic_write(APIC_ICR, low);
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 u64 native_apic_icr_read(void)
@@ -479,16 +480,20 @@ static int lapic_next_deadline(unsigned long delta,
 
 static int lapic_timer_shutdown(struct clock_event_device *evt)
 {
+	unsigned long flags;
 	unsigned int v;
 
 	/* Lapic used as dummy for broadcast ? */
 	if (evt->features & CLOCK_EVT_FEAT_DUMMY)
 		return 0;
 
+	flags = hard_local_irq_save();
 	v = apic_read(APIC_LVTT);
 	v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
 	apic_write(APIC_LVTT, v);
 	apic_write(APIC_TMICT, 0);
+	hard_local_irq_restore(flags);
+
 	return 0;
 }
 
@@ -523,6 +528,17 @@ static void lapic_timer_broadcast(const struct cpumask *mask)
 #endif
 }
 
+#ifdef CONFIG_IPIPE
+static void lapic_itimer_ack(void)
+{
+	__ack_APIC_irq();
+}
+
+static DEFINE_PER_CPU(struct ipipe_timer, lapic_itimer) = {
+	.irq = ipipe_apic_vector_irq(LOCAL_TIMER_VECTOR),
+	.ack = lapic_itimer_ack,
+};
+#endif /* CONFIG_IPIPE */
 
 /*
  * The local apic timer can be used for any function which is CPU local.
@@ -655,6 +671,16 @@ static void setup_APIC_timer(void)
 
 	memcpy(levt, &lapic_clockevent, sizeof(*levt));
 	levt->cpumask = cpumask_of(smp_processor_id());
+#ifdef CONFIG_IPIPE
+	if (!(lapic_clockevent.features & CLOCK_EVT_FEAT_DUMMY))
+		levt->ipipe_timer = this_cpu_ptr(&lapic_itimer);
+	else {
+		static atomic_t once = ATOMIC_INIT(-1);
+		if (atomic_inc_and_test(&once))
+			printk(KERN_INFO
+			       "I-pipe: cannot use LAPIC as a tick device\n");
+	}
+#endif /* CONFIG_IPIPE */
 
 	if (this_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)) {
 		levt->name = "lapic-deadline";
@@ -1294,7 +1320,7 @@ void lapic_shutdown(void)
 	if (!boot_cpu_has(X86_FEATURE_APIC) && !apic_from_smp_config())
 		return;
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 
 #ifdef CONFIG_X86_32
 	if (!enabled_via_apicbase)
@@ -1304,7 +1330,7 @@ void lapic_shutdown(void)
 		disable_local_APIC();
 
 
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 /**
@@ -1554,7 +1580,7 @@ static bool apic_check_and_ack(union apic_ir *irr, union apic_ir *isr)
 		 * per set bit.
 		 */
 		for_each_set_bit(bit, isr->map, APIC_IR_BITS)
-			ack_APIC_irq();
+			__ack_APIC_irq();
 		return true;
 	}
 
@@ -2190,7 +2216,7 @@ __visible void __irq_entry smp_spurious_interrupt(struct pt_regs *regs)
 	if (v & (1 << (vector & 0x1f))) {
 		pr_info("Spurious interrupt (vector 0x%02x) on CPU#%d. Acked\n",
 			vector, smp_processor_id());
-		ack_APIC_irq();
+		__ack_APIC_irq();
 	} else {
 		pr_info("Spurious interrupt (vector 0x%02x) on CPU#%d. Not pending!\n",
 			vector, smp_processor_id());
@@ -2642,12 +2668,12 @@ static int lapic_suspend(void)
 		apic_pm_state.apic_cmci = apic_read(APIC_LVTCMCI);
 #endif
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 	disable_local_APIC();
 
 	irq_remapping_disable();
 
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 	return 0;
 }
 
@@ -2660,7 +2686,7 @@ static void lapic_resume(void)
 	if (!apic_pm_state.active)
 		return;
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 
 	/*
 	 * IO-APIC and PIC have their own resume routines.
@@ -2718,7 +2744,7 @@ static void lapic_resume(void)
 
 	irq_remapping_reenable(x2apic_mode);
 
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 /*
diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index 7862b152a052..d3762181070f 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -52,9 +52,9 @@ static void _flat_send_IPI_mask(unsigned long mask, int vector)
 {
 	unsigned long flags;
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 	__default_send_IPI_dest_field(mask, vector, apic->dest_logical);
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 static void flat_send_IPI_mask(const struct cpumask *cpumask, int vector)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 4b6301946f45..01317c45259f 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -78,7 +78,7 @@
 #define for_each_irq_pin(entry, head) \
 	list_for_each_entry(entry, &head, list)
 
-static DEFINE_RAW_SPINLOCK(ioapic_lock);
+static IPIPE_DEFINE_RAW_SPINLOCK(ioapic_lock);
 static DEFINE_MUTEX(ioapic_mutex);
 static unsigned int ioapic_dynirq_base;
 static int ioapic_initialized;
@@ -466,13 +466,19 @@ static void io_apic_sync(struct irq_pin_list *entry)
 	readl(&io_apic->data);
 }
 
+static inline void __mask_ioapic(struct mp_chip_data *data)
+{
+	io_apic_modify_irq(data, ~0, IO_APIC_REDIR_MASKED, &io_apic_sync);
+}
+
 static void mask_ioapic_irq(struct irq_data *irq_data)
 {
 	struct mp_chip_data *data = irq_data->chip_data;
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&ioapic_lock, flags);
-	io_apic_modify_irq(data, ~0, IO_APIC_REDIR_MASKED, &io_apic_sync);
+	ipipe_lock_irq(irq_data->irq);
+	__mask_ioapic(data);
 	raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
@@ -488,6 +494,7 @@ static void unmask_ioapic_irq(struct irq_data *irq_data)
 
 	raw_spin_lock_irqsave(&ioapic_lock, flags);
 	__unmask_ioapic(data);
+	ipipe_unlock_irq(irq_data->irq);
 	raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
@@ -531,14 +538,20 @@ static void __eoi_ioapic_pin(int apic, int pin, int vector)
 	}
 }
 
-static void eoi_ioapic_pin(int vector, struct mp_chip_data *data)
+static void _eoi_ioapic_pin(int vector, struct mp_chip_data *data)
 {
-	unsigned long flags;
 	struct irq_pin_list *entry;
 
-	raw_spin_lock_irqsave(&ioapic_lock, flags);
 	for_each_irq_pin(entry, data->irq_2_pin)
 		__eoi_ioapic_pin(entry->apic, entry->pin, vector);
+}
+
+void eoi_ioapic_pin(int vector, struct mp_chip_data *data)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&ioapic_lock, flags);
+	_eoi_ioapic_pin(vector, data);
 	raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
@@ -1206,6 +1219,19 @@ EXPORT_SYMBOL(IO_APIC_get_PCI_irq_vector);
 
 static struct irq_chip ioapic_chip, ioapic_ir_chip;
 
+#ifdef CONFIG_IPIPE
+static void startup_legacy_irq(unsigned irq)
+{
+	unsigned long flags;
+	legacy_pic->mask(irq);
+	flags = hard_local_irq_save();
+	__ipipe_unlock_irq(irq);
+	hard_local_irq_restore(flags);
+}
+#else /* !CONFIG_IPIPE */
+#define startup_legacy_irq(irq) legacy_pic->mask(irq)
+#endif /* !CONFIG_IPIPE */
+
 static void __init setup_IO_APIC_irqs(void)
 {
 	unsigned int ioapic, pin;
@@ -1689,11 +1715,12 @@ static unsigned int startup_ioapic_irq(struct irq_data *data)
 
 	raw_spin_lock_irqsave(&ioapic_lock, flags);
 	if (irq < nr_legacy_irqs()) {
-		legacy_pic->mask(irq);
+		startup_legacy_irq(irq);
 		if (legacy_pic->irq_pending(irq))
 			was_pending = 1;
 	}
 	__unmask_ioapic(data->chip_data);
+	ipipe_unlock_irq(irq);
 	raw_spin_unlock_irqrestore(&ioapic_lock, flags);
 
 	return was_pending;
@@ -1701,7 +1728,7 @@ static unsigned int startup_ioapic_irq(struct irq_data *data)
 
 atomic_t irq_mis_count;
 
-#ifdef CONFIG_GENERIC_PENDING_IRQ
+#if defined(CONFIG_GENERIC_PENDING_IRQ) || (defined(CONFIG_IPIPE) && defined(CONFIG_SMP))
 static bool io_apic_level_ack_pending(struct mp_chip_data *data)
 {
 	struct irq_pin_list *entry;
@@ -1786,9 +1813,9 @@ static void ioapic_ack_level(struct irq_data *irq_data)
 {
 	struct irq_cfg *cfg = irqd_cfg(irq_data);
 	unsigned long v;
-	bool masked;
 	int i;
-
+#ifndef CONFIG_IPIPE
+	bool masked;
 	irq_complete_move(cfg);
 	masked = ioapic_irqd_mask(irq_data);
 
@@ -1846,6 +1873,24 @@ static void ioapic_ack_level(struct irq_data *irq_data)
 	}
 
 	ioapic_irqd_unmask(irq_data, masked);
+#else /* CONFIG_IPIPE */
+	/*
+	 * Prevent low priority IRQs grabbed by high priority domains
+	 * from being delayed, waiting for a high priority interrupt
+	 * handler running in a low priority domain to complete.
+	 * This code assumes hw interrupts off.
+	 */
+	i = cfg->vector;
+	v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1));
+	if (unlikely(!(v & (1 << (i & 0x1f))))) {
+		/* IO-APIC erratum: see comment above. */
+		atomic_inc(&irq_mis_count);
+		raw_spin_lock(&ioapic_lock);
+		_eoi_ioapic_pin(cfg->vector, irq_data->chip_data);
+		raw_spin_unlock(&ioapic_lock);
+	}
+	__ack_APIC_irq();
+#endif /* CONFIG_IPIPE */
 }
 
 static void ioapic_ir_ack_level(struct irq_data *irq_data)
@@ -1941,6 +1986,69 @@ static int ioapic_irq_get_chip_state(struct irq_data *irqd,
 	return 0;
 }
 
+#ifdef CONFIG_IPIPE
+
+#ifdef CONFIG_SMP
+
+void move_xxapic_irq(struct irq_data *irq_data)
+{
+	unsigned int irq = irq_data->irq;
+	struct irq_desc *desc = irq_to_desc(irq);
+	struct mp_chip_data *data = irq_data->chip_data;
+	struct irq_cfg *cfg = irqd_cfg(irq_data);
+
+	if (desc->handle_irq == &handle_edge_irq) {
+		raw_spin_lock(&desc->lock);
+		irq_complete_move(cfg);
+		irq_move_irq(irq_data);
+		raw_spin_unlock(&desc->lock);
+	} else if (desc->handle_irq == &handle_fasteoi_irq) {
+		raw_spin_lock(&desc->lock);
+		irq_complete_move(cfg);
+		if (unlikely(irqd_is_setaffinity_pending(irq_data))) {
+			if (!io_apic_level_ack_pending(data))
+				irq_move_masked_irq(irq_data);
+			unmask_ioapic_irq(irq_data);
+		}
+		raw_spin_unlock(&desc->lock);
+	} else
+		WARN_ON_ONCE(1);
+}
+
+#endif  /* CONFIG_SMP */
+
+static void hold_ioapic_irq(struct irq_data *irq_data)
+{
+	struct mp_chip_data *data = irq_data->chip_data;
+
+	raw_spin_lock(&ioapic_lock);
+	__mask_ioapic(data);
+	raw_spin_unlock(&ioapic_lock);
+	ioapic_ack_level(irq_data);
+}
+
+static void hold_ioapic_ir_irq(struct irq_data *irq_data)
+{
+	struct mp_chip_data *data = irq_data->chip_data;
+
+	raw_spin_lock(&ioapic_lock);
+	__mask_ioapic(data);
+	raw_spin_unlock(&ioapic_lock);
+	ioapic_ir_ack_level(irq_data);
+}
+
+static void release_ioapic_irq(struct irq_data *irq_data)
+{
+	struct mp_chip_data *data = irq_data->chip_data;
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&ioapic_lock, flags);
+	__unmask_ioapic(data);
+	raw_spin_unlock_irqrestore(&ioapic_lock, flags);
+}
+
+#endif	/* CONFIG_IPIPE */
+
 static struct irq_chip ioapic_chip __read_mostly = {
 	.name			= "IO-APIC",
 	.irq_startup		= startup_ioapic_irq,
@@ -1951,6 +2059,13 @@ static struct irq_chip ioapic_chip __read_mostly = {
 	.irq_set_affinity	= ioapic_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
 	.irq_get_irqchip_state	= ioapic_irq_get_chip_state,
+#ifdef CONFIG_IPIPE
+#ifdef CONFIG_SMP
+	.irq_move		= move_xxapic_irq,
+#endif
+	.irq_hold		= hold_ioapic_irq,
+	.irq_release		= release_ioapic_irq,
+#endif
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
 
@@ -1964,6 +2079,13 @@ static struct irq_chip ioapic_ir_chip __read_mostly = {
 	.irq_set_affinity	= ioapic_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
 	.irq_get_irqchip_state	= ioapic_irq_get_chip_state,
+#ifdef CONFIG_IPIPE
+#ifdef CONFIG_SMP
+	.irq_move		= move_xxapic_irq,
+#endif
+	.irq_hold		= hold_ioapic_ir_irq,
+	.irq_release		= release_ioapic_irq,
+#endif
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
 
@@ -1995,23 +2117,29 @@ static inline void init_IO_APIC_traps(void)
 
 static void mask_lapic_irq(struct irq_data *data)
 {
-	unsigned long v;
+	unsigned long v, flags;
 
+	flags = hard_cond_local_irq_save();
+	ipipe_lock_irq(data->irq);
 	v = apic_read(APIC_LVT0);
 	apic_write(APIC_LVT0, v | APIC_LVT_MASKED);
+	hard_cond_local_irq_restore(flags);
 }
 
 static void unmask_lapic_irq(struct irq_data *data)
 {
-	unsigned long v;
+	unsigned long v, flags;
 
+	flags = hard_cond_local_irq_save();
 	v = apic_read(APIC_LVT0);
 	apic_write(APIC_LVT0, v & ~APIC_LVT_MASKED);
+	ipipe_unlock_irq(data->irq);
+	hard_cond_local_irq_restore(flags);
 }
 
 static void ack_lapic_irq(struct irq_data *data)
 {
-	ack_APIC_irq();
+	__ack_APIC_irq();
 }
 
 static struct irq_chip lapic_chip __read_mostly = {
@@ -2019,6 +2147,9 @@ static struct irq_chip lapic_chip __read_mostly = {
 	.irq_mask	= mask_lapic_irq,
 	.irq_unmask	= unmask_lapic_irq,
 	.irq_ack	= ack_lapic_irq,
+#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
+	.irq_move	= move_xxapic_irq,
+#endif
 };
 
 static void lapic_register_intr(int irq)
@@ -2141,7 +2272,7 @@ static inline void __init check_timer(void)
 	/*
 	 * get/set the timer IRQ vector:
 	 */
-	legacy_pic->mask(0);
+	startup_legacy_irq(0);
 
 	/*
 	 * As IRQ0 is to be enabled in the 8259A, the virtual
@@ -2238,6 +2369,10 @@ static inline void __init check_timer(void)
 		    "...trying to set up timer as Virtual Wire IRQ...\n");
 
 	lapic_register_intr(0);
+#if defined(CONFIG_IPIPE) && defined(CONFIG_X86_64)
+	irq_to_desc(0)->ipipe_ack = __ipipe_ack_edge_irq;
+	irq_to_desc(0)->ipipe_end = __ipipe_nop_irq;
+#endif
 	apic_write(APIC_LVT0, APIC_DM_FIXED | cfg->vector);	/* Fixed mode */
 	legacy_pic->unmask(0);
 
@@ -2246,7 +2381,7 @@ static inline void __init check_timer(void)
 		goto out;
 	}
 	local_irq_disable();
-	legacy_pic->mask(0);
+	startup_legacy_irq(0);
 	apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | cfg->vector);
 	apic_printk(APIC_QUIET, KERN_INFO "..... failed.\n");
 
@@ -2624,6 +2759,21 @@ int acpi_get_override_irq(u32 gsi, int *trigger, int *polarity)
 	return 0;
 }
 
+#ifdef CONFIG_IPIPE
+unsigned int __ipipe_get_ioapic_irq_vector(int irq)
+{
+	if (irq >= IPIPE_FIRST_APIC_IRQ && irq < IPIPE_NR_XIRQS)
+		return ipipe_apic_irq_vector(irq);
+	else if (irq == IRQ_MOVE_CLEANUP_VECTOR)
+		return irq;
+	else {
+		if (irq_cfg(irq) == NULL)
+			return ISA_IRQ_VECTOR(irq); /* Assume ISA. */
+		return irq_cfg(irq)->vector;
+	}
+}
+#endif /* CONFIG_IPIPE */
+
 /*
  * This function updates target affinity of IOAPIC interrupts to include
  * the CPUs which came online during SMP bringup.
@@ -3024,7 +3174,7 @@ int mp_irqdomain_alloc(struct irq_domain *domain, unsigned int virq,
 		mp_setup_entry(cfg, data, info->ioapic_entry);
 	mp_register_handler(virq, data->trigger);
 	if (virq < nr_legacy_irqs())
-		legacy_pic->mask(virq);
+		startup_legacy_irq(virq);
 	local_irq_restore(flags);
 
 	apic_printk(APIC_VERBOSE, KERN_DEBUG
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index 6ca0f91372fd..b6784487a2c2 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -117,7 +117,9 @@ void __default_send_IPI_shortcut(unsigned int shortcut, int vector)
 	 * to the APIC.
 	 */
 	unsigned int cfg;
+	unsigned long flags;
 
+	flags = hard_cond_local_irq_save();
 	/*
 	 * Wait for idle.
 	 */
@@ -136,6 +138,8 @@ void __default_send_IPI_shortcut(unsigned int shortcut, int vector)
 	 * Send the IPI. The write to APIC_ICR fires this off.
 	 */
 	native_apic_mem_write(APIC_ICR, cfg);
+
+	hard_cond_local_irq_restore(flags);
 }
 
 /*
@@ -144,8 +148,9 @@ void __default_send_IPI_shortcut(unsigned int shortcut, int vector)
  */
 void __default_send_IPI_dest_field(unsigned int mask, int vector, unsigned int dest)
 {
-	unsigned long cfg;
+	unsigned long cfg, flags;
 
+	flags = hard_cond_local_irq_save();
 	/*
 	 * Wait for idle.
 	 */
@@ -169,6 +174,8 @@ void __default_send_IPI_dest_field(unsigned int mask, int vector, unsigned int d
 	 * Send the IPI. The write to APIC_ICR fires this off.
 	 */
 	native_apic_mem_write(APIC_ICR, cfg);
+
+	hard_cond_local_irq_restore(flags);
 }
 
 void default_send_IPI_single_phys(int cpu, int vector)
@@ -191,12 +198,12 @@ void default_send_IPI_mask_sequence_phys(const struct cpumask *mask, int vector)
 	 * to an arbitrary mask, so I do a unicast to each CPU instead.
 	 * - mbligh
 	 */
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 	for_each_cpu(query_cpu, mask) {
 		__default_send_IPI_dest_field(per_cpu(x86_cpu_to_apicid,
 				query_cpu), vector, APIC_DEST_PHYSICAL);
 	}
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
@@ -208,14 +215,14 @@ void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
 
 	/* See Hack comment above */
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 	for_each_cpu(query_cpu, mask) {
 		if (query_cpu == this_cpu)
 			continue;
 		__default_send_IPI_dest_field(per_cpu(x86_cpu_to_apicid,
 				 query_cpu), vector, APIC_DEST_PHYSICAL);
 	}
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 /*
@@ -255,12 +262,12 @@ void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
 	 * should be modified to do 1 message per cluster ID - mbligh
 	 */
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 	for_each_cpu(query_cpu, mask)
 		__default_send_IPI_dest_field(
 			early_per_cpu(x86_cpu_to_logical_apicid, query_cpu),
 			vector, apic->dest_logical);
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
@@ -272,7 +279,7 @@ void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
 
 	/* See Hack comment above */
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 	for_each_cpu(query_cpu, mask) {
 		if (query_cpu == this_cpu)
 			continue;
@@ -280,7 +287,7 @@ void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
 			early_per_cpu(x86_cpu_to_logical_apicid, query_cpu),
 			vector, apic->dest_logical);
 		}
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 /*
@@ -294,10 +301,10 @@ void default_send_IPI_mask_logical(const struct cpumask *cpumask, int vector)
 	if (!mask)
 		return;
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 	WARN_ON(mask & ~cpumask_bits(cpu_online_mask)[0]);
 	__default_send_IPI_dest_field(mask, vector, apic->dest_logical);
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 /* must come after the send_IPI functions above for inlining */
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index a20873bbbed6..f96a721ecfe1 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -178,7 +178,10 @@ static struct irq_chip pci_msi_controller = {
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
 	.irq_compose_msi_msg	= irq_msi_compose_msg,
 	.irq_set_affinity	= msi_set_affinity,
-	.flags			= IRQCHIP_SKIP_SET_WAKE,
+#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
+	.irq_move		= move_xxapic_irq,
+#endif
+	.flags			= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_PIPELINE_SAFE,
 };
 
 int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
@@ -279,7 +282,10 @@ static struct irq_chip pci_msi_ir_controller = {
 	.irq_ack		= irq_chip_ack_parent,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
 	.irq_set_vcpu_affinity	= irq_chip_set_vcpu_affinity_parent,
-	.flags			= IRQCHIP_SKIP_SET_WAKE,
+#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
+	.irq_move		= move_xxapic_irq,
+#endif
+	.flags			= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_PIPELINE_SAFE,
 };
 
 static struct msi_domain_info pci_msi_ir_domain_info = {
@@ -322,7 +328,10 @@ static struct irq_chip dmar_msi_controller = {
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
 	.irq_compose_msi_msg	= irq_msi_compose_msg,
 	.irq_write_msi_msg	= dmar_msi_write_msg,
-	.flags			= IRQCHIP_SKIP_SET_WAKE,
+#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
+	.irq_move		= move_xxapic_irq,
+#endif
+	.flags			= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_PIPELINE_SAFE,
 };
 
 static irq_hw_number_t dmar_msi_get_hwirq(struct msi_domain_info *info,
@@ -420,7 +429,10 @@ static struct irq_chip hpet_msi_controller __ro_after_init = {
 	.irq_retrigger = irq_chip_retrigger_hierarchy,
 	.irq_compose_msi_msg = irq_msi_compose_msg,
 	.irq_write_msi_msg = hpet_msi_write_msg,
-	.flags = IRQCHIP_SKIP_SET_WAKE,
+#if defined(CONFIG_IPIPE) && defined(CONFIG_SMP)
+	.irq_move = move_xxapic_irq,
+#endif
+	.flags = IRQCHIP_SKIP_SET_WAKE | IRQCHIP_PIPELINE_SAFE,
 };
 
 static irq_hw_number_t hpet_msi_get_hwirq(struct msi_domain_info *info,
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index c8203694d9ce..6b7f54d13456 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -39,7 +39,7 @@ struct apic_chip_data {
 
 struct irq_domain *x86_vector_domain;
 EXPORT_SYMBOL_GPL(x86_vector_domain);
-static DEFINE_RAW_SPINLOCK(vector_lock);
+static IPIPE_DEFINE_RAW_SPINLOCK(vector_lock);
 static cpumask_var_t vector_searchmask;
 static struct irq_chip lapic_controller;
 static struct irq_matrix *vector_matrix;
@@ -119,7 +119,9 @@ static void apic_update_irq_cfg(struct irq_data *irqd, unsigned int vector,
 {
 	struct apic_chip_data *apicd = apic_chip_data(irqd);
 
+#ifndef CONFIG_IPIPE
 	lockdep_assert_held(&vector_lock);
+#endif
 
 	apicd->hw_irq_cfg.vector = vector;
 	apicd->hw_irq_cfg.dest_apicid = apic->calc_dest_apicid(cpu);
@@ -135,7 +137,9 @@ static void apic_update_vector(struct irq_data *irqd, unsigned int newvec,
 	struct irq_desc *desc = irq_data_to_desc(irqd);
 	bool managed = irqd_affinity_is_managed(irqd);
 
+#ifndef CONFIG_IPIPE
 	lockdep_assert_held(&vector_lock);
+#endif
 
 	trace_vector_update(irqd->irq, newvec, newcpu, apicd->vector,
 			    apicd->cpu);
@@ -225,7 +229,9 @@ assign_vector_locked(struct irq_data *irqd, const struct cpumask *dest)
 	unsigned int cpu = apicd->cpu;
 	int vector = apicd->vector;
 
+#ifndef CONFIG_IPIPE
 	lockdep_assert_held(&vector_lock);
+#endif
 
 	/*
 	 * If the current target CPU is online and in the new requested
@@ -332,7 +338,9 @@ static void clear_irq_vector(struct irq_data *irqd)
 	bool managed = irqd_affinity_is_managed(irqd);
 	unsigned int vector = apicd->vector;
 
+#ifndef CONFIG_IPIPE
 	lockdep_assert_held(&vector_lock);
+#endif
 
 	if (!vector)
 		return;
@@ -742,7 +750,9 @@ void lapic_online(void)
 {
 	unsigned int vector;
 
+#ifndef CONFIG_IPIPE
 	lockdep_assert_held(&vector_lock);
+#endif
 
 	/* Online the vector matrix array for this CPU */
 	irq_matrix_online(vector_matrix);
@@ -803,13 +813,17 @@ static int apic_retrigger_irq(struct irq_data *irqd)
 
 void apic_ack_irq(struct irq_data *irqd)
 {
+#ifndef CONFIG_IPIPE
 	irq_move_irq(irqd);
-	ack_APIC_irq();
+#endif /* !CONFIG_IPIPE */
+	__ack_APIC_irq();
 }
 
 void apic_ack_edge(struct irq_data *irqd)
 {
+#ifndef CONFIG_IPIPE
 	irq_complete_move(irqd_cfg(irqd));
+#endif /* !CONFIG_IPIPE */
 	apic_ack_irq(irqd);
 }
 
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index b0889c48a2ac..0e1126a7320c 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -42,7 +42,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
 	u32 dest;
 
 	x2apic_wrmsr_fence();
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 
 	tmpmsk = this_cpu_cpumask_var_ptr(ipi_mask);
 	cpumask_copy(tmpmsk, mask);
@@ -66,7 +66,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
 		cpumask_andnot(tmpmsk, tmpmsk, &cmsk->mask);
 	}
 
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 static void x2apic_send_IPI_mask(const struct cpumask *mask, int vector)
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index bc9693841353..2d6c4e33ba33 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -50,7 +50,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
 
 	x2apic_wrmsr_fence();
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 
 	this_cpu = smp_processor_id();
 	for_each_cpu(query_cpu, mask) {
@@ -59,7 +59,7 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
 		__x2apic_send_IPI_dest(per_cpu(x86_cpu_to_apicid, query_cpu),
 				       vector, APIC_DEST_PHYSICAL);
 	}
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 static void x2apic_send_IPI_mask(const struct cpumask *mask, int vector)
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 5c7ee3df4d0b..343927455003 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -38,6 +38,9 @@ static void __used common(void)
 #endif
 
 	BLANK();
+#ifdef CONFIG_IPIPE
+        OFFSET(TASK_TI_ipipe, task_struct, thread_info.ipipe_flags);
+#endif
 	OFFSET(TASK_addr_limit, task_struct, thread.addr_limit);
 
 	BLANK();
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8a85c2e144a6..6daae55d8f87 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1736,6 +1736,7 @@ void syscall_init(void)
 	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
 }
 
+#ifndef CONFIG_IPIPE
 DEFINE_PER_CPU(int, debug_stack_usage);
 DEFINE_PER_CPU(u32, debug_idt_ctr);
 
@@ -1754,6 +1755,7 @@ void debug_stack_reset(void)
 		load_current_idt();
 }
 NOKPROBE_SYMBOL(debug_stack_reset);
+#endif /* !CONFIG_IPIPE */
 
 #else	/* CONFIG_X86_64 */
 
diff --git a/arch/x86/kernel/cpu/mtrr/cyrix.c b/arch/x86/kernel/cpu/mtrr/cyrix.c
index 72182809b333..a9d703aab500 100644
--- a/arch/x86/kernel/cpu/mtrr/cyrix.c
+++ b/arch/x86/kernel/cpu/mtrr/cyrix.c
@@ -19,7 +19,7 @@ cyrix_get_arr(unsigned int reg, unsigned long *base,
 
 	arr = CX86_ARR_BASE + (reg << 1) + reg;	/* avoid multiplication by 3 */
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 
 	ccr3 = getCx86(CX86_CCR3);
 	setCx86(CX86_CCR3, (ccr3 & 0x0f) | 0x10);	/* enable MAPEN */
@@ -29,7 +29,7 @@ cyrix_get_arr(unsigned int reg, unsigned long *base,
 	rcr = getCx86(CX86_RCR_BASE + reg);
 	setCx86(CX86_CCR3, ccr3);			/* disable MAPEN */
 
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 
 	shift = ((unsigned char *) base)[1] & 0x0f;
 	*base >>= PAGE_SHIFT;
@@ -180,6 +180,7 @@ static void cyrix_set_arr(unsigned int reg, unsigned long base,
 			  unsigned long size, mtrr_type type)
 {
 	unsigned char arr, arr_type, arr_size;
+	unsigned long flags;
 
 	arr = CX86_ARR_BASE + (reg << 1) + reg;	/* avoid multiplication by 3 */
 
@@ -223,6 +224,8 @@ static void cyrix_set_arr(unsigned int reg, unsigned long base,
 		}
 	}
 
+	flags = hard_local_irq_save();
+
 	prepare_set();
 
 	base <<= PAGE_SHIFT;
@@ -232,6 +235,8 @@ static void cyrix_set_arr(unsigned int reg, unsigned long base,
 	setCx86(CX86_RCR_BASE + reg, arr_type);
 
 	post_set();
+
+	hard_local_irq_restore(flags);
 }
 
 typedef struct {
@@ -249,8 +254,10 @@ static unsigned char ccr_state[7] = { 0, 0, 0, 0, 0, 0, 0 };
 
 static void cyrix_set_all(void)
 {
+	unsigned long flags;
 	int i;
 
+	flags = hard_local_irq_save();
 	prepare_set();
 
 	/* the CCRs are not contiguous */
@@ -265,6 +272,7 @@ static void cyrix_set_all(void)
 	}
 
 	post_set();
+	hard_local_irq_restore(flags);
 }
 
 static const struct mtrr_ops cyrix_mtrr_ops = {
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index aa5c064a6a22..13ac247df330 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -797,7 +797,7 @@ static void generic_set_all(void)
 	unsigned long mask, count;
 	unsigned long flags;
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 	prepare_set();
 
 	/* Actually set the state */
@@ -807,7 +807,7 @@ static void generic_set_all(void)
 	pat_init();
 
 	post_set();
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 
 	/* Use the atomic bitops to update the global mask */
 	for (count = 0; count < sizeof(mask) * 8; ++count) {
@@ -831,12 +831,13 @@ static void generic_set_all(void)
 static void generic_set_mtrr(unsigned int reg, unsigned long base,
 			     unsigned long size, mtrr_type type)
 {
-	unsigned long flags;
+	unsigned long rflags, vflags;
 	struct mtrr_var_range *vr;
 
 	vr = &mtrr_state.var_ranges[reg];
 
-	local_irq_save(flags);
+	local_irq_save(vflags);
+	rflags = hard_local_irq_save();
 	prepare_set();
 
 	if (size == 0) {
@@ -857,7 +858,8 @@ static void generic_set_mtrr(unsigned int reg, unsigned long base,
 	}
 
 	post_set();
-	local_irq_restore(flags);
+	hard_local_irq_restore(rflags);
+	local_irq_restore(vflags);
 }
 
 int generic_validate_add_page(unsigned long base, unsigned long size,
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index cd8839027f66..23ca0afdd352 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -36,18 +36,13 @@ union fpregs_state init_fpstate __read_mostly;
  *
  *   - to debug kernel_fpu_begin()/end() correctness
  */
-static DEFINE_PER_CPU(bool, in_kernel_fpu);
+DEFINE_PER_CPU(bool, in_kernel_fpu);
 
 /*
  * Track which context is using the FPU on the CPU:
  */
 DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
 
-static bool kernel_fpu_disabled(void)
-{
-	return this_cpu_read(in_kernel_fpu);
-}
-
 static bool interrupted_kernel_fpu_idle(void)
 {
 	return !kernel_fpu_disabled();
@@ -84,12 +79,13 @@ EXPORT_SYMBOL(irq_fpu_usable);
 
 void kernel_fpu_begin(void)
 {
-	preempt_disable();
+	unsigned long flags;
 
+	preempt_disable();
 	WARN_ON_FPU(!irq_fpu_usable());
-	WARN_ON_FPU(this_cpu_read(in_kernel_fpu));
 
-	this_cpu_write(in_kernel_fpu, true);
+	flags = hard_cond_local_irq_save();
+	kernel_fpu_disable();
 
 	if (!(current->flags & PF_KTHREAD) &&
 	    !test_thread_flag(TIF_NEED_FPU_LOAD)) {
@@ -101,6 +97,7 @@ void kernel_fpu_begin(void)
 		copy_fpregs_to_fpstate(&current->thread.fpu);
 	}
 	__cpu_invalidate_fpregs_state();
+	hard_cond_local_irq_restore(flags);
 
 	if (boot_cpu_has(X86_FEATURE_XMM))
 		ldmxcsr(MXCSR_DEFAULT);
@@ -112,9 +109,11 @@ EXPORT_SYMBOL_GPL(kernel_fpu_begin);
 
 void kernel_fpu_end(void)
 {
-	WARN_ON_FPU(!this_cpu_read(in_kernel_fpu));
+	unsigned long flags;
 
-	this_cpu_write(in_kernel_fpu, false);
+	flags = hard_cond_local_irq_save();
+	kernel_fpu_enable();
+	hard_cond_local_irq_restore(flags);
 	preempt_enable();
 }
 EXPORT_SYMBOL_GPL(kernel_fpu_end);
@@ -126,9 +125,11 @@ EXPORT_SYMBOL_GPL(kernel_fpu_end);
  */
 void fpu__save(struct fpu *fpu)
 {
+	unsigned long flags;
+
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
-	fpregs_lock();
+	flags = fpregs_lock();
 	trace_x86_fpu_before_save(fpu);
 
 	if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
@@ -138,7 +139,7 @@ void fpu__save(struct fpu *fpu)
 	}
 
 	trace_x86_fpu_after_save(fpu);
-	fpregs_unlock();
+	fpregs_unlock(flags);
 }
 
 /*
@@ -174,6 +175,7 @@ int fpu__copy(struct task_struct *dst, struct task_struct *src)
 {
 	struct fpu *dst_fpu = &dst->thread.fpu;
 	struct fpu *src_fpu = &src->thread.fpu;
+	unsigned long flags;
 
 	dst_fpu->last_cpu = -1;
 
@@ -196,14 +198,14 @@ int fpu__copy(struct task_struct *dst, struct task_struct *src)
 	 * ( The function 'fails' in the FNSAVE case, which destroys
 	 *   register contents so we have to load them back. )
 	 */
-	fpregs_lock();
+	flags = fpregs_lock();
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
 		memcpy(&dst_fpu->state, &src_fpu->state, fpu_kernel_xstate_size);
 
 	else if (!copy_fpregs_to_fpstate(dst_fpu))
 		copy_kernel_to_fpregs(&dst_fpu->state);
 
-	fpregs_unlock();
+	fpregs_unlock(flags);
 
 	set_tsk_thread_flag(dst, TIF_NEED_FPU_LOAD);
 
@@ -217,7 +219,7 @@ int fpu__copy(struct task_struct *dst, struct task_struct *src)
  * Activate the current task's in-memory FPU context,
  * if it has not been used before:
  */
-static void fpu__initialize(struct fpu *fpu)
+void fpu__initialize(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
@@ -270,6 +272,14 @@ void fpu__prepare_write(struct fpu *fpu)
 	__fpu_invalidate_fpregs_state(fpu);
 }
 
+#ifdef CONFIG_IPIPE
+#define FWAIT_PROLOGUE "sti\n"
+#define FWAIT_EPILOGUE "cli\n"
+#else
+#define FWAIT_PROLOGUE
+#define FWAIT_EPILOGUE
+#endif
+
 /*
  * Drops current FPU state: deactivates the fpregs and
  * the fpstate. NOTE: it still leaves previous contents
@@ -281,19 +291,22 @@ void fpu__prepare_write(struct fpu *fpu)
  */
 void fpu__drop(struct fpu *fpu)
 {
-	preempt_disable();
+	unsigned long flags;
 
+	flags = hard_preempt_disable();
 	if (fpu == &current->thread.fpu) {
 		/* Ignore delayed exceptions from user space */
-		asm volatile("1: fwait\n"
+		asm volatile(FWAIT_PROLOGUE
+			     "1: fwait\n"
 			     "2:\n"
+			     FWAIT_EPILOGUE
 			     _ASM_EXTABLE(1b, 2b));
 		fpregs_deactivate(fpu);
 	}
 
 	trace_x86_fpu_dropped(fpu);
 
-	preempt_enable();
+	hard_preempt_enable(flags);
 }
 
 /*
@@ -302,7 +315,9 @@ void fpu__drop(struct fpu *fpu)
  */
 static inline void copy_init_fpstate_to_fpregs(void)
 {
-	fpregs_lock();
+	unsigned long flags;
+
+	flags = fpregs_lock();
 
 	if (use_xsave())
 		copy_kernel_to_xregs(&init_fpstate.xsave, -1);
@@ -315,7 +330,7 @@ static inline void copy_init_fpstate_to_fpregs(void)
 		copy_init_pkru_to_fpregs();
 
 	fpregs_mark_activate();
-	fpregs_unlock();
+	fpregs_unlock(flags);
 }
 
 /*
@@ -326,8 +341,11 @@ static inline void copy_init_fpstate_to_fpregs(void)
  */
 void fpu__clear(struct fpu *fpu)
 {
+	unsigned long flags;
 	WARN_ON_FPU(fpu != &current->thread.fpu); /* Almost certainly an anomaly */
 
+	flags = hard_cond_local_irq_save();
+
 	fpu__drop(fpu);
 
 	/*
@@ -336,6 +354,8 @@ void fpu__clear(struct fpu *fpu)
 	fpu__initialize(fpu);
 	if (static_cpu_has(X86_FEATURE_FPU))
 		copy_init_fpstate_to_fpregs();
+
+	hard_cond_local_irq_restore(flags);
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 400a05e1c1c5..617099544e22 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -61,11 +61,12 @@ static inline int save_fsave_header(struct task_struct *tsk, void __user *buf)
 		struct xregs_state *xsave = &tsk->thread.fpu.state.xsave;
 		struct user_i387_ia32_struct env;
 		struct _fpstate_32 __user *fp = buf;
+		unsigned long flags;
 
-		fpregs_lock();
+		flags = fpregs_lock();
 		if (!test_thread_flag(TIF_NEED_FPU_LOAD))
 			copy_fxregs_to_kernel(&tsk->thread.fpu);
-		fpregs_unlock();
+		fpregs_unlock(flags);
 
 		convert_from_fxsr(&env, tsk);
 
@@ -165,6 +166,7 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 {
 	struct task_struct *tsk = current;
 	int ia32_fxstate = (buf != buf_fx);
+	unsigned long flags;
 	int ret;
 
 	ia32_fxstate &= (IS_ENABLED(CONFIG_X86_32) ||
@@ -185,14 +187,14 @@ retry:
 	 * userland's stack frame which will likely succeed. If it does not,
 	 * resolve the fault in the user memory and try again.
 	 */
-	fpregs_lock();
+	flags = fpregs_lock();
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
 		__fpregs_load_activate();
 
 	pagefault_disable();
 	ret = copy_fpregs_to_sigframe(buf_fx);
 	pagefault_enable();
-	fpregs_unlock();
+	fpregs_unlock(flags);
 
 	if (ret) {
 		if (!fault_in_pages_writeable(buf_fx, fpu_user_xstate_size))
@@ -277,6 +279,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 	struct task_struct *tsk = current;
 	struct fpu *fpu = &tsk->thread.fpu;
 	struct user_i387_ia32_struct env;
+	unsigned long flags;
 	u64 xfeatures = 0;
 	int fx_only = 0;
 	int ret = 0;
@@ -343,17 +346,17 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		 * going through the kernel buffer with the enabled pagefault
 		 * handler.
 		 */
-		fpregs_lock();
+		flags = fpregs_lock();
 		pagefault_disable();
 		ret = copy_user_to_fpregs_zeroing(buf_fx, xfeatures, fx_only);
 		pagefault_enable();
 		if (!ret) {
 			fpregs_mark_activate();
-			fpregs_unlock();
+			fpregs_unlock(flags);
 			return 0;
 		}
 		fpregs_deactivate(fpu);
-		fpregs_unlock();
+		fpregs_unlock(flags);
 	}
 
 
@@ -373,7 +376,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 
 		sanitize_restored_xstate(&fpu->state, envp, xfeatures, fx_only);
 
-		fpregs_lock();
+		flags = fpregs_lock();
 		if (unlikely(init_bv))
 			copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
 		ret = copy_kernel_to_xregs_err(&fpu->state.xsave, xfeatures);
@@ -387,7 +390,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 
 		sanitize_restored_xstate(&fpu->state, envp, xfeatures, fx_only);
 
-		fpregs_lock();
+		flags = fpregs_lock();
 		if (use_xsave()) {
 			u64 init_bv = xfeatures_mask & ~XFEATURE_MASK_FPSSE;
 			copy_kernel_to_xregs(&init_fpstate.xsave, init_bv);
@@ -399,14 +402,14 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		if (ret)
 			goto err_out;
 
-		fpregs_lock();
+		flags = fpregs_lock();
 		ret = copy_kernel_to_fregs_err(&fpu->state.fsave);
 	}
 	if (!ret)
 		fpregs_mark_activate();
 	else
 		fpregs_deactivate(fpu);
-	fpregs_unlock();
+	fpregs_unlock(flags);
 
 err_out:
 	if (ret)
diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
index fe522691ac71..e8ade4992d2d 100644
--- a/arch/x86/kernel/i8259.c
+++ b/arch/x86/kernel/i8259.c
@@ -33,7 +33,7 @@
 static void init_8259A(int auto_eoi);
 
 static int i8259A_auto_eoi;
-DEFINE_RAW_SPINLOCK(i8259A_lock);
+IPIPE_DEFINE_RAW_SPINLOCK(i8259A_lock);
 
 /*
  * 8259A PIC functions to handle ISA devices:
@@ -61,6 +61,7 @@ static void mask_8259A_irq(unsigned int irq)
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&i8259A_lock, flags);
+	ipipe_lock_irq(irq);
 	cached_irq_mask |= mask;
 	if (irq & 8)
 		outb(cached_slave_mask, PIC_SLAVE_IMR);
@@ -76,15 +77,18 @@ static void disable_8259A_irq(struct irq_data *data)
 
 static void unmask_8259A_irq(unsigned int irq)
 {
-	unsigned int mask = ~(1 << irq);
+	unsigned int mask = (1 << irq);
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&i8259A_lock, flags);
-	cached_irq_mask &= mask;
-	if (irq & 8)
-		outb(cached_slave_mask, PIC_SLAVE_IMR);
-	else
-		outb(cached_master_mask, PIC_MASTER_IMR);
+	if (cached_irq_mask & mask) {
+		cached_irq_mask &= ~mask;
+		if (irq & 8)
+			outb(cached_slave_mask, PIC_SLAVE_IMR);
+		else
+			outb(cached_master_mask, PIC_MASTER_IMR);
+		ipipe_unlock_irq(irq);
+	}
 	raw_spin_unlock_irqrestore(&i8259A_lock, flags);
 }
 
@@ -171,6 +175,18 @@ static void mask_and_ack_8259A(struct irq_data *data)
 	 */
 	if (cached_irq_mask & irqmask)
 		goto spurious_8259A_irq;
+#ifdef CONFIG_IPIPE
+	if (irq == 0) {
+		/*
+		 * Fast timer ack -- don't mask (unless supposedly
+		 * spurious). We trace outb's in order to detect
+		 * broken hardware inducing large delays.
+		 */
+		outb(0x60, PIC_MASTER_CMD);	/* Specific EOI to master. */
+		raw_spin_unlock_irqrestore(&i8259A_lock, flags);
+		return;
+	}
+#endif /* CONFIG_IPIPE */
 	cached_irq_mask |= irqmask;
 
 handle_real_irq:
@@ -227,6 +243,7 @@ struct irq_chip i8259A_chip = {
 	.irq_disable	= disable_8259A_irq,
 	.irq_unmask	= enable_8259A_irq,
 	.irq_mask_ack	= mask_and_ack_8259A,
+	.flags		= IRQCHIP_PIPELINE_SAFE,
 };
 
 static char irq_trigger[2];
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 7bb4c3cbf4dc..ea919100022a 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -114,6 +114,10 @@ static const __initconst struct idt_data apic_idts[] = {
 	INTG(CALL_FUNCTION_SINGLE_VECTOR, call_function_single_interrupt),
 	INTG(IRQ_MOVE_CLEANUP_VECTOR,	irq_move_cleanup_interrupt),
 	INTG(REBOOT_VECTOR,		reboot_interrupt),
+#ifdef CONFIG_IPIPE
+	INTG(IPIPE_RESCHEDULE_VECTOR,	ipipe_reschedule_interrupt),
+	INTG(IPIPE_CRITICAL_VECTOR,	ipipe_critical_interrupt),
+#endif
 #endif
 
 #ifdef CONFIG_X86_THERMAL_VECTOR
@@ -144,6 +148,9 @@ static const __initconst struct idt_data apic_idts[] = {
 #endif
 	INTG(SPURIOUS_APIC_VECTOR,	spurious_interrupt),
 	INTG(ERROR_APIC_VECTOR,		error_interrupt),
+#ifdef CONFIG_IPIPE
+	INTG(IPIPE_HRTIMER_VECTOR,	ipipe_hrtimer_interrupt),
+#endif
 #endif
 };
 
@@ -308,9 +315,26 @@ void __init idt_setup_apic_and_irq_gates(void)
 {
 	int i = FIRST_EXTERNAL_VECTOR;
 	void *entry;
+	unsigned int __maybe_unused cpu, ret;
 
 	idt_setup_from_table(idt_table, apic_idts, ARRAY_SIZE(apic_idts), true);
 
+#if defined(CONFIG_SMP) && defined(CONFIG_IPIPE)
+	/*
+	 * The cleanup vector is not part of the system vector range
+	 * but rather belongs to the external IRQ range, however we
+	 * still need to map it early to a legit interrupt number for
+	 * pipelining. Allocate a specific descriptor manually for it,
+	 * using IRQ_MOVE_CLEANUP_VECTOR as both the vector number and
+	 * interrupt number, so that we know the latter at build time.
+	 */
+	ret = irq_alloc_descs(IRQ_MOVE_CLEANUP_VECTOR, 0, 1, 0);
+	BUG_ON(IRQ_MOVE_CLEANUP_VECTOR != ret);
+	for_each_possible_cpu(cpu)
+		per_cpu(vector_irq, cpu)[IRQ_MOVE_CLEANUP_VECTOR] =
+			irq_to_desc(IRQ_MOVE_CLEANUP_VECTOR);
+#endif
+
 	for_each_clear_bit_from(i, system_vectors, FIRST_SYSTEM_VECTOR) {
 		entry = irq_entries_start + 8 * (i - FIRST_EXTERNAL_VECTOR);
 		set_intr_gate(i, entry);
diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c
new file mode 100644
index 000000000000..b017eddf8e1f
--- /dev/null
+++ b/arch/x86/kernel/ipipe.c
@@ -0,0 +1,553 @@
+/*   -*- linux-c -*-
+ *   linux/arch/x86/kernel/ipipe.c
+ *
+ *   Copyright (C) 2002-2012 Philippe Gerum.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ *   USA; either version 2 of the License, or (at your option) any later
+ *   version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *   GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ *   Architecture-dependent I-PIPE support for x86.
+ */
+
+#include <linux/kernel.h>
+#include <linux/smp.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/sched/debug.h>
+#include <linux/interrupt.h>
+#include <linux/slab.h>
+#include <linux/irq.h>
+#include <linux/clockchips.h>
+#include <linux/kprobes.h>
+#include <linux/mm.h>
+#include <linux/extable.h>
+#include <linux/ipipe_tickdev.h>
+#include <asm/asm-offsets.h>
+#include <asm/unistd.h>
+#include <asm/processor.h>
+#include <asm/atomic.h>
+#include <asm/hw_irq.h>
+#include <asm/irq.h>
+#include <asm/desc.h>
+#include <asm/io.h>
+#ifdef CONFIG_X86_LOCAL_APIC
+#include <asm/tlbflush.h>
+#include <asm/fixmap.h>
+#include <asm/bitops.h>
+#include <asm/mpspec.h>
+#ifdef CONFIG_X86_IO_APIC
+#include <asm/io_apic.h>
+#endif	/* CONFIG_X86_IO_APIC */
+#include <asm/apic.h>
+#endif	/* CONFIG_X86_LOCAL_APIC */
+#include <asm/fpu/internal.h>
+#include <asm/traps.h>
+#include <asm/tsc.h>
+#include <asm/mce.h>
+#include <asm/mmu_context.h>
+
+void smp_apic_timer_interrupt(struct pt_regs *regs);
+void smp_kvm_posted_intr_wakeup_ipi(struct pt_regs *regs);
+void smp_kvm_posted_intr_ipi(struct pt_regs *regs);
+void smp_spurious_interrupt(struct pt_regs *regs);
+void smp_error_interrupt(struct pt_regs *regs);
+void smp_x86_platform_ipi(struct pt_regs *regs);
+void smp_irq_work_interrupt(struct pt_regs *regs);
+void smp_reschedule_interrupt(struct pt_regs *regs);
+void smp_call_function_interrupt(struct pt_regs *regs);
+void smp_call_function_single_interrupt(struct pt_regs *regs);
+void smp_irq_move_cleanup_interrupt(void);
+void smp_reboot_interrupt(void);
+void smp_thermal_interrupt(struct pt_regs *regs);
+void smp_threshold_interrupt(struct pt_regs *regs);
+
+void __ipipe_do_IRQ(unsigned int irq, void *cookie);
+
+DEFINE_PER_CPU(unsigned long, __ipipe_cr2);
+EXPORT_PER_CPU_SYMBOL_GPL(__ipipe_cr2);
+
+int ipipe_get_sysinfo(struct ipipe_sysinfo *info)
+{
+	info->sys_nr_cpus = num_online_cpus();
+	info->sys_cpu_freq = __ipipe_cpu_freq;
+	info->sys_hrtimer_irq = per_cpu(ipipe_percpu.hrtimer_irq, 0);
+	info->sys_hrtimer_freq = __ipipe_hrtimer_freq;
+	info->sys_hrclock_freq = __ipipe_hrclock_freq;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipipe_get_sysinfo);
+
+#ifdef CONFIG_X86_LOCAL_APIC
+
+static void __ipipe_noack_apic(struct irq_desc *desc)
+{
+}
+
+static void __ipipe_ack_apic(struct irq_desc *desc)
+{
+	__ack_APIC_irq();
+}
+
+#endif	/* CONFIG_X86_LOCAL_APIC */
+
+/*
+ * __ipipe_enable_pipeline() -- We are running on the boot CPU, hw
+ * interrupts are off, and secondary CPUs are still lost in space.
+ */
+void __init __ipipe_enable_pipeline(void)
+{
+	unsigned int irq;
+
+#ifdef CONFIG_X86_LOCAL_APIC
+
+	/* Map the APIC system vectors. */
+
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(LOCAL_TIMER_VECTOR),
+			  __ipipe_do_IRQ, smp_apic_timer_interrupt,
+			  __ipipe_ack_apic);
+
+#ifdef CONFIG_HAVE_KVM
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(POSTED_INTR_WAKEUP_VECTOR),
+			  __ipipe_do_IRQ, smp_kvm_posted_intr_wakeup_ipi,
+			  __ipipe_ack_apic);
+
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(POSTED_INTR_VECTOR),
+			  __ipipe_do_IRQ, smp_kvm_posted_intr_ipi,
+			  __ipipe_ack_apic);
+#endif
+
+#if defined(CONFIG_X86_MCE_AMD) && defined(CONFIG_X86_64)
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(DEFERRED_ERROR_VECTOR),
+			  __ipipe_do_IRQ, smp_deferred_error_interrupt,
+			  __ipipe_ack_apic);
+#endif
+
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(SPURIOUS_APIC_VECTOR),
+			  __ipipe_do_IRQ, smp_spurious_interrupt,
+			  __ipipe_noack_apic);
+
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(ERROR_APIC_VECTOR),
+			  __ipipe_do_IRQ, smp_error_interrupt,
+			  __ipipe_ack_apic);
+
+#ifdef CONFIG_X86_THERMAL_VECTOR
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(THERMAL_APIC_VECTOR),
+			  __ipipe_do_IRQ, smp_thermal_interrupt,
+			  __ipipe_ack_apic);
+#endif /* CONFIG_X86_THERMAL_VECTOR */
+
+#ifdef CONFIG_X86_MCE_THRESHOLD
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(THRESHOLD_APIC_VECTOR),
+			  __ipipe_do_IRQ, smp_threshold_interrupt,
+			  __ipipe_ack_apic);
+#endif /* CONFIG_X86_MCE_THRESHOLD */
+
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(X86_PLATFORM_IPI_VECTOR),
+			  __ipipe_do_IRQ, smp_x86_platform_ipi,
+			  __ipipe_ack_apic);
+
+	/*
+	 * We expose two high priority APIC vectors the head domain
+	 * may use respectively for hires timing and SMP rescheduling.
+	 * We should never receive them in the root domain.
+	 */
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(IPIPE_HRTIMER_VECTOR),
+			  __ipipe_do_IRQ, smp_spurious_interrupt,
+			  __ipipe_ack_apic);
+
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(IPIPE_RESCHEDULE_VECTOR),
+			  __ipipe_do_IRQ, smp_spurious_interrupt,
+			  __ipipe_ack_apic);
+
+#ifdef CONFIG_IRQ_WORK
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(IRQ_WORK_VECTOR),
+			  __ipipe_do_IRQ, smp_irq_work_interrupt,
+			  __ipipe_ack_apic);
+#endif /* CONFIG_IRQ_WORK */
+
+#endif	/* CONFIG_X86_LOCAL_APIC */
+
+#ifdef CONFIG_SMP
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(RESCHEDULE_VECTOR),
+			  __ipipe_do_IRQ, smp_reschedule_interrupt,
+			  __ipipe_ack_apic);
+
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(CALL_FUNCTION_VECTOR),
+			  __ipipe_do_IRQ, smp_call_function_interrupt,
+			  __ipipe_ack_apic);
+
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(CALL_FUNCTION_SINGLE_VECTOR),
+			  __ipipe_do_IRQ, smp_call_function_single_interrupt,
+			  __ipipe_ack_apic);
+
+	ipipe_request_irq(ipipe_root_domain,
+			  IRQ_MOVE_CLEANUP_VECTOR,
+			  __ipipe_do_IRQ, smp_irq_move_cleanup_interrupt,
+			  __ipipe_ack_apic);
+
+	ipipe_request_irq(ipipe_root_domain,
+			  ipipe_apic_vector_irq(REBOOT_VECTOR),
+			  __ipipe_do_IRQ, smp_reboot_interrupt,
+			  __ipipe_ack_apic);
+#endif	/* CONFIG_SMP */
+
+	/*
+	 * Finally, request the remaining ISA and IO-APIC
+	 * interrupts. Interrupts which have already been requested
+	 * will just beget a silent -EBUSY error, that's ok.
+	 */
+	for (irq = 0; irq < IPIPE_NR_XIRQS; irq++)
+		ipipe_request_irq(ipipe_root_domain, irq,
+				  __ipipe_do_IRQ, do_IRQ,
+				  NULL);
+}
+
+#ifdef CONFIG_SMP
+int irq_activate(struct irq_desc *desc);
+
+int ipipe_set_irq_affinity(unsigned int irq, cpumask_t cpumask)
+{
+	struct irq_desc *desc;
+	struct irq_chip *chip;
+	int err;
+
+	cpumask_and(&cpumask, &cpumask, cpu_online_mask);
+	if (cpumask_empty(&cpumask) || ipipe_virtual_irq_p(irq))
+		return -EINVAL;
+
+	desc = irq_to_desc(irq);
+	if (desc == NULL)
+		return -EINVAL;
+
+	chip = irq_desc_get_chip(desc);
+	if (chip->irq_set_affinity == NULL)
+		return -ENOSYS;
+
+	err = irq_activate(desc);
+	if (err)
+		return err;
+
+	chip->irq_set_affinity(irq_get_irq_data(irq), &cpumask, true);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipipe_set_irq_affinity);
+
+void ipipe_send_ipi(unsigned int ipi, cpumask_t cpumask)
+{
+	unsigned long flags;
+
+	flags = hard_local_irq_save();
+
+	cpumask_clear_cpu(ipipe_processor_id(), &cpumask);
+	if (likely(!cpumask_empty(&cpumask)))
+		apic->send_IPI_mask(&cpumask, ipipe_apic_irq_vector(ipi));
+
+	hard_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(ipipe_send_ipi);
+
+void __ipipe_hook_critical_ipi(struct ipipe_domain *ipd)
+{
+	unsigned int ipi = IPIPE_CRITICAL_IPI;
+
+	ipd->irqs[ipi].ackfn = __ipipe_ack_apic;
+	ipd->irqs[ipi].handler = __ipipe_do_critical_sync;
+	ipd->irqs[ipi].cookie = NULL;
+	ipd->irqs[ipi].control = IPIPE_HANDLE_MASK|IPIPE_STICKY_MASK;
+}
+
+#endif	/* CONFIG_SMP */
+
+void __ipipe_halt_root(int use_mwait)
+{
+	struct ipipe_percpu_domain_data *p;
+
+	/* Emulate sti+hlt sequence over the root domain. */
+
+	hard_local_irq_disable();
+
+	p = ipipe_this_cpu_root_context();
+
+	trace_hardirqs_on();
+	__clear_bit(IPIPE_STALL_FLAG, &p->status);
+
+	if (unlikely(__ipipe_ipending_p(p))) {
+		__ipipe_sync_stage();
+		hard_local_irq_enable();
+	} else {
+#ifdef CONFIG_IPIPE_TRACE_IRQSOFF
+		ipipe_trace_end(0x8000000E);
+#endif /* CONFIG_IPIPE_TRACE_IRQSOFF */
+		if (use_mwait)
+			asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
+				     :: "a" (0), "c" (0));
+		else
+			asm volatile("sti; hlt": : :"memory");
+	}
+}
+EXPORT_SYMBOL_GPL(__ipipe_halt_root);
+
+static inline void __ipipe_fixup_if(bool stalled, struct pt_regs *regs)
+{
+	/*
+	 * Have the saved hw state look like the domain stall bit, so
+	 * that __ipipe_unstall_iret_root() restores the proper
+	 * pipeline state for the root stage upon exit.
+	 */
+	if (stalled)
+		regs->flags &= ~X86_EFLAGS_IF;
+	else
+		regs->flags |= X86_EFLAGS_IF;
+}
+
+dotraplinkage int __ipipe_trap_prologue(struct pt_regs *regs, int trapnr, unsigned long *flags)
+{
+	bool entry_irqs_off = hard_irqs_disabled();
+	struct ipipe_domain *ipd;
+	unsigned long cr2;
+
+	if (trapnr == X86_TRAP_PF)
+		cr2 = native_read_cr2();
+
+	/*
+	 * KGDB and ftrace may poke int3/debug ops into the kernel
+	 * code. Trap those exceptions early, do conditional fixups to
+	 * the interrupt state depending on the current domain, let
+	 * the regular handler see them.
+	 */
+	if (unlikely(!user_mode(regs) &&
+		     (trapnr == X86_TRAP_DB || trapnr == X86_TRAP_BP))) {
+
+		if (ipipe_root_p)
+			goto root_fixup;
+
+		/*
+		 * Skip interrupt state fixup from the head domain,
+		 * but do call the regular handler which is assumed to
+		 * run fine within such context.
+		 */
+		return -1;
+	}
+
+	/*
+	 * Now that we have filtered out all debug traps which might
+	 * happen anywhere in kernel code in theory, detect attempts
+	 * to probe kernel memory (i.e. calls to probe_kernel_{read,
+	 * write}()). If that happened over the head domain, do the
+	 * fixup immediately then return right after upon success. If
+	 * that fails, the kernel is likely to crash but let's follow
+	 * the standard recovery procedure in that case anyway.
+	 */
+	if (unlikely(!ipipe_root_p && faulthandler_disabled())) {
+		if (fixup_exception(regs, trapnr, regs->orig_ax, 0))
+			return 1;
+	}
+
+	if (unlikely(__ipipe_notify_trap(trapnr, regs)))
+		return 1;
+
+	if (likely(ipipe_root_p)) {
+	root_fixup:
+		/*
+		 * If no head domain is installed, or in case we faulted in
+		 * the iret path of x86-32, regs->flags does not match the root
+		 * domain state. The fault handler may evaluate it. So fix this
+		 * up with the current state.
+		 */
+		local_save_flags(*flags);
+		__ipipe_fixup_if(raw_irqs_disabled_flags(*flags), regs);
+
+		/*
+		 * Sync Linux interrupt state with hardware state on
+		 * entry.
+		 */
+		if (entry_irqs_off)
+			local_irq_disable();
+	} else {
+		/* Plan for restoring the original flags at fault. */
+		*flags = regs->flags;
+
+		/*
+		 * Detect unhandled faults over the head domain,
+		 * switching to root so that it can handle the fault
+		 * cleanly.
+		 */
+		hard_local_irq_disable();
+		ipd = __ipipe_current_domain;
+		__ipipe_set_current_domain(ipipe_root_domain);
+
+		ipipe_trace_panic_freeze();
+
+		/*
+		 * Prevent warnings of this debug checker to focus on the
+		 * actual bug.
+		 */
+		if (test_bit(IPIPE_STALL_FLAG, &__ipipe_head_status))
+			ipipe_context_check_off();
+
+		/* Sync Linux interrupt state with hardware state on entry. */
+		if (entry_irqs_off)
+			local_irq_disable();
+
+		/* Always warn about user land and unfixable faults. */
+		if (user_mode(regs) ||
+		    !search_exception_tables(instruction_pointer(regs))) {
+			printk(KERN_ERR "BUG: Unhandled exception over domain"
+			       " %s at 0x%lx - switching to ROOT\n",
+			       ipd->name, instruction_pointer(regs));
+			dump_stack();
+		} else if (IS_ENABLED(CONFIG_IPIPE_DEBUG)) {
+			/* Also report fixable ones when debugging is enabled. */
+			printk(KERN_WARNING "WARNING: Fixable exception over "
+			       "domain %s at 0x%lx - switching to ROOT\n",
+			       ipd->name, instruction_pointer(regs));
+			dump_stack();
+		}
+	}
+
+	if (trapnr == X86_TRAP_PF)
+		write_cr2(cr2);
+
+	return 0;
+}
+
+dotraplinkage
+void __ipipe_trap_epilogue(struct pt_regs *regs,
+			   unsigned long flags, unsigned long regs_flags)
+{
+	ipipe_restore_root(raw_irqs_disabled_flags(flags));
+	__ipipe_fixup_if(raw_irqs_disabled_flags(regs_flags), regs);
+}
+
+static inline int __ipipe_irq_from_vector(int vector, int *irq)
+{
+	struct irq_desc *desc;
+
+	if (vector >= FIRST_SYSTEM_VECTOR) {
+		*irq = ipipe_apic_vector_irq(vector);
+		return 0;
+	}
+
+	desc = __this_cpu_read(vector_irq[vector]);
+	if (likely(!IS_ERR_OR_NULL(desc))) {
+		*irq = irq_desc_get_irq(desc);
+		return 0;
+	}
+
+	if (vector == IRQ_MOVE_CLEANUP_VECTOR) {
+		*irq = vector;
+		return 0;
+	}
+
+#ifdef CONFIG_X86_LOCAL_APIC
+	__ack_APIC_irq();
+#endif
+	pr_err("unexpected IRQ trap at vector %#x\n", vector);
+	return -1;
+}
+
+int __ipipe_handle_irq(struct pt_regs *regs)
+{
+	struct ipipe_percpu_data *p = __ipipe_raw_cpu_ptr(&ipipe_percpu);
+	int irq, vector = regs->orig_ax, flags = 0;
+	struct pt_regs *tick_regs;
+
+	if (likely(vector < 0)) {
+		if (__ipipe_irq_from_vector(~vector, &irq) < 0)
+			goto out;
+	} else { /* Software-generated. */
+		irq = vector;
+		flags = IPIPE_IRQF_NOACK;
+	}
+
+	ipipe_trace_irqbegin(irq, regs);
+
+	/*
+	 * Given our deferred dispatching model for regular IRQs, we
+	 * only record CPU regs for the last timer interrupt, so that
+	 * the timer handler charges CPU times properly. It is assumed
+	 * that no other interrupt handler cares for such information.
+	 */
+	if (irq == p->hrtimer_irq || p->hrtimer_irq == -1) {
+		tick_regs = &p->tick_regs;
+		tick_regs->flags = regs->flags;
+		tick_regs->cs = regs->cs;
+		tick_regs->ip = regs->ip;
+		tick_regs->bp = regs->bp;
+#ifdef CONFIG_X86_64
+		tick_regs->ss = regs->ss;
+		tick_regs->sp = regs->sp;
+#endif
+		if (!__ipipe_root_p)
+			tick_regs->flags &= ~X86_EFLAGS_IF;
+	}
+
+	__ipipe_dispatch_irq(irq, flags);
+
+	if (user_mode(regs) && ipipe_test_thread_flag(TIP_MAYDAY))
+		__ipipe_call_mayday(regs);
+
+	ipipe_trace_irqend(irq, regs);
+
+out:
+	if (!__ipipe_root_p ||
+	    test_bit(IPIPE_STALL_FLAG, &__ipipe_root_status))
+		return 0;
+
+	return 1;
+}
+
+void __ipipe_arch_share_current(int flags)
+{
+	struct task_struct *p = current;
+
+	/*
+	 * Setup a clean extended FPU state for kernel threads.
+	 */
+	if (p->mm == NULL)
+		memcpy(&p->thread.fpu.state,
+		       &init_fpstate, fpu_kernel_xstate_size);
+}
+
+struct task_struct *__switch_to(struct task_struct *prev_p,
+				struct task_struct *next_p);
+EXPORT_SYMBOL_GPL(do_munmap);
+EXPORT_SYMBOL_GPL(__switch_to);
+EXPORT_SYMBOL_GPL(show_stack);
+
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+EXPORT_SYMBOL(tasklist_lock);
+#endif /* CONFIG_SMP || CONFIG_DEBUG_SPINLOCK */
+
+#if defined(CONFIG_CC_STACKPROTECTOR) && defined(CONFIG_X86_64)
+EXPORT_PER_CPU_SYMBOL_GPL(irq_stack_union);
+#endif
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 21efee32e2b1..c1021c298092 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -49,7 +49,7 @@ void ack_bad_irq(unsigned int irq)
 	 * completely.
 	 * But only ack when the APIC is enabled -AK
 	 */
-	ack_APIC_irq();
+	__ack_APIC_irq();
 }
 
 #define irq_stats(x)		(&per_cpu(irq_stat, x))
@@ -237,12 +237,13 @@ __visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
 	/* high bit used in ret_from_ code  */
 	unsigned vector = ~regs->orig_ax;
 
+	desc = __this_cpu_read(vector_irq[vector]);
+	__ipipe_move_root_irq(desc);
 	entering_irq();
 
 	/* entering_irq() tells RCU that we're not quiescent.  Check it. */
 	RCU_LOCKDEP_WARN(!rcu_is_watching(), "IRQ failed to wake up RCU");
 
-	desc = __this_cpu_read(vector_irq[vector]);
 	if (likely(!IS_ERR_OR_NULL(desc))) {
 		if (IS_ENABLED(CONFIG_X86_32))
 			handle_irq(desc, regs);
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index 6b32ab009c19..b998d982cb4a 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -70,3 +70,30 @@ int irq_init_percpu_irqstack(unsigned int cpu)
 		return 0;
 	return map_irq_stack(cpu);
 }
+
+#ifdef CONFIG_IPIPE
+
+void __ipipe_do_IRQ(unsigned int irq, void *cookie)
+{
+	struct pt_regs *regs = raw_cpu_ptr(&ipipe_percpu.tick_regs);
+	struct pt_regs *old_regs = set_irq_regs(regs);
+	unsigned int (*handler)(struct pt_regs *regs);
+	struct irq_desc *desc;
+
+	handler = (typeof(handler))cookie;
+
+	entering_irq();
+
+	if (handler == do_IRQ) {
+		desc = irq_to_desc(irq);
+		generic_handle_irq_desc(desc);
+	} else {
+		handler(regs);
+	}
+
+	exiting_irq();
+
+	set_irq_regs(old_regs);
+}
+
+#endif
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index c44fe7d8d9a4..c454c0312bd7 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -577,9 +577,9 @@ kgdb_notify(struct notifier_block *self, unsigned long cmd, void *ptr)
 	unsigned long flags;
 	int ret;
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 	ret = __kgdb_notify(ptr, cmd);
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 
 	return ret;
 }
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 5bb001c0c771..50b7eff3f26b 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -488,6 +488,7 @@ static DEFINE_PER_CPU(unsigned long, nmi_cr2);
  */
 static DEFINE_PER_CPU(int, update_debug_stack);
 
+#ifndef CONFIG_IPIPE
 static bool notrace is_debug_stack(unsigned long addr)
 {
 	struct cea_exception_stacks *cs = __this_cpu_read(cea_exception_stacks);
@@ -504,6 +505,9 @@ static bool notrace is_debug_stack(unsigned long addr)
 	return addr >= bot && addr < top;
 }
 NOKPROBE_SYMBOL(is_debug_stack);
+#else /* IPIPE */
+static bool notrace is_debug_stack(unsigned long addr) { return 0; }
+#endif
 #endif
 
 dotraplinkage notrace void
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 571e38c9ee1d..b1ebdc01a14a 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -116,8 +116,16 @@ void exit_thread(struct task_struct *tsk)
 	if (bp) {
 		struct tss_struct *tss = &per_cpu(cpu_tss_rw, get_cpu());
 
-		t->io_bitmap_ptr = NULL;
+		/*
+		 * The caller may be preempted via I-pipe: to make
+		 * sure TIF_IO_BITMAP always denotes a valid I/O
+		 * bitmap when set, we clear it _before_ the I/O
+		 * bitmap pointer. No cache coherence issue ahead as
+		 * migration is currently locked (the primary domain
+		 * may never migrate either).
+		 */
 		clear_thread_flag(TIF_IO_BITMAP);
+		t->io_bitmap_ptr = NULL;
 		/*
 		 * Careful, clear this in the TSS too:
 		 */
@@ -426,7 +434,9 @@ static __always_inline void __speculation_ctrl_update(unsigned long tifp,
 	u64 msr = x86_spec_ctrl_base;
 	bool updmsr = false;
 
+#ifndef CONFIG_IPIPE
 	lockdep_assert_irqs_disabled();
+#endif
 
 	/* Handle change of TIF_SSBD depending on the mitigation method. */
 	if (static_cpu_has(X86_FEATURE_VIRT_SSBD)) {
@@ -474,9 +484,9 @@ void speculation_ctrl_update(unsigned long tif)
 	unsigned long flags;
 
 	/* Forced update. Make sure all relevant TIF flags are different */
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 	__speculation_ctrl_update(~tif, tif);
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 /* Called from seccomp/prctl update */
@@ -589,7 +599,7 @@ bool xen_set_default_idle(void)
 
 void stop_this_cpu(void *dummy)
 {
-	local_irq_disable();
+	hard_local_irq_disable();
 	/*
 	 * Remove this CPU:
 	 */
@@ -685,7 +695,11 @@ static __cpuidle void mwait_idle(void)
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
 		if (!need_resched())
+#ifdef CONFIG_IPIPE
+			__ipipe_halt_root(1);
+#else
 			__sti_mwait(0, 0);
+#endif
 		else
 			local_irq_enable();
 		trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
@@ -745,6 +759,10 @@ void __init arch_post_acpi_subsys_init(void)
 	if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
 		mark_tsc_unstable("TSC halt in AMD C1E");
 	pr_info("System has AMD C1E enabled\n");
+#ifdef CONFIG_IPIPE
+	pr_info("I-pipe: will not be able to use LAPIC as a tick device\n"
+		"I-pipe: disable C1E power state in your BIOS\n");
+#endif
 }
 
 static int __init idle_setup(char *str)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index da3cc3a10d63..63575cf12f9e 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -507,7 +507,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	struct thread_struct *next = &next_p->thread;
 	struct fpu *prev_fpu = &prev->fpu;
 	struct fpu *next_fpu = &next->fpu;
-	int cpu = smp_processor_id();
+	int cpu = raw_smp_processor_id();
 
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) &&
 		     this_cpu_read(irq_count) != -1);
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index b8d4e9c3c070..5a75d586e5cb 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -213,10 +213,10 @@ static void native_stop_other_cpus(int wait)
 			udelay(1);
 	}
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 	disable_local_APIC();
 	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 /*
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8367bd7a9a81..704a5d11b1a2 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1136,7 +1136,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 {
 	int apicid = apic->cpu_present_to_apicid(cpu);
 	int cpu0_nmi_registered = 0;
-	unsigned long flags;
+	unsigned long vflags, rflags;
 	int err, ret = 0;
 
 	lockdep_assert_irqs_enabled();
@@ -1187,9 +1187,11 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 	 * Check TSC synchronization with the AP (keep irqs disabled
 	 * while doing so):
 	 */
-	local_irq_save(flags);
+	local_irq_save(vflags);
+	rflags = hard_local_irq_save();
 	check_tsc_sync_source(cpu);
-	local_irq_restore(flags);
+	hard_local_irq_restore(rflags);
+	local_irq_restore(vflags);
 
 	while (!cpu_online(cpu)) {
 		cpu_relax();
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4bb0f8447112..e36568748768 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -14,6 +14,7 @@
 
 #include <linux/context_tracking.h>
 #include <linux/interrupt.h>
+#include <linux/ipipe.h>
 #include <linux/kallsyms.h>
 #include <linux/spinlock.h>
 #include <linux/kprobes.h>
@@ -77,13 +78,13 @@ DECLARE_BITMAP(system_vectors, NR_VECTORS);
 static inline void cond_local_irq_enable(struct pt_regs *regs)
 {
 	if (regs->flags & X86_EFLAGS_IF)
-		local_irq_enable();
+		hard_local_irq_enable_notrace();
 }
 
 static inline void cond_local_irq_disable(struct pt_regs *regs)
 {
 	if (regs->flags & X86_EFLAGS_IF)
-		local_irq_disable();
+		hard_local_irq_disable_notrace();
 }
 
 /*
@@ -529,7 +530,7 @@ do_general_protection(struct pt_regs *regs, long error_code)
 	}
 
 	if (v8086_mode(regs)) {
-		local_irq_enable();
+		hard_local_irq_enable();
 		handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
 		return;
 	}
@@ -912,7 +913,7 @@ NOKPROBE_SYMBOL(do_device_not_available);
 dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code)
 {
 	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
-	local_irq_enable();
+	hard_local_irq_enable();
 
 	if (notify_die(DIE_TRAP, "iret exception", regs, error_code,
 			X86_TRAP_IRET, SIGILL) != NOTIFY_STOP) {
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 7e322e2daaf5..835856efd71f 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -752,11 +752,11 @@ static unsigned long pit_hpet_ptimer_calibrate_cpu(void)
 		 * calibration, which will take at least 50ms, and
 		 * read the end value.
 		 */
-		local_irq_save(flags);
+		flags = hard_local_irq_save();
 		tsc1 = tsc_read_refs(&ref1, hpet);
 		tsc_pit_khz = pit_calibrate_tsc(latch, ms, loopmin);
 		tsc2 = tsc_read_refs(&ref2, hpet);
-		local_irq_restore(flags);
+		hard_local_irq_restore(flags);
 
 		/* Pick the lowest PIT TSC calibration so far */
 		tsc_pit_min = min(tsc_pit_min, tsc_pit_khz);
@@ -865,9 +865,9 @@ unsigned long native_calibrate_cpu_early(void)
 	if (!fast_calibrate)
 		fast_calibrate = cpu_khz_from_msr();
 	if (!fast_calibrate) {
-		local_irq_save(flags);
+		flags = hard_local_irq_save();
 		fast_calibrate = quick_pit_calibrate();
-		local_irq_restore(flags);
+		hard_local_irq_restore(flags);
 	}
 	return fast_calibrate;
 }
@@ -1130,7 +1130,7 @@ static struct clocksource clocksource_tsc_early = {
  * this one will immediately take over. We will only register if TSC has
  * been found good.
  */
-static struct clocksource clocksource_tsc = {
+struct clocksource clocksource_tsc = {
 	.name                   = "tsc",
 	.rating                 = 300,
 	.read                   = read_tsc,
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index a76c12b38e92..d28e32e79092 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -147,12 +147,14 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 	}
 
 	preempt_disable();
+	hard_cond_local_irq_disable();
 	tsk->thread.sp0 = vm86->saved_sp0;
 	tsk->thread.sysenter_cs = __KERNEL_CS;
 	update_task_stack(tsk);
 	refresh_sysenter_cs(&tsk->thread);
 	vm86->saved_sp0 = 0;
 	preempt_enable();
+	hard_cond_local_irq_enable();
 
 	memcpy(&regs->pt, &vm86->regs32, sizeof(struct pt_regs));
 
@@ -365,6 +367,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 	vm86->saved_sp0 = tsk->thread.sp0;
 	lazy_save_gs(vm86->regs32.gs);
 
+	hard_cond_local_irq_disable();
 	/* make room for real-mode segments */
 	preempt_disable();
 	tsk->thread.sp0 += 16;
@@ -376,6 +379,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 
 	update_task_stack(tsk);
 	preempt_enable();
+	hard_cond_local_irq_enable();
 
 	if (vm86->flags & VM86_SCREEN_BITMAP)
 		mark_screen_rdonly(tsk->mm);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 484c32b7f79f..bb7780565099 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1076,23 +1076,29 @@ static void fetch_register_operand(struct operand *op)
 	}
 }
 
-static void emulator_get_fpu(void)
+static unsigned long emulator_get_fpu(void)
 {
-	fpregs_lock();
+	unsigned long flags;
+
+	flags = fpregs_lock();
 
 	fpregs_assert_state_consistent();
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
 		switch_fpu_return();
+
+	return flags;
 }
 
-static void emulator_put_fpu(void)
+static void emulator_put_fpu(unsigned long flags)
 {
-	fpregs_unlock();
+	fpregs_unlock(flags);
 }
 
 static void read_sse_reg(struct x86_emulate_ctxt *ctxt, sse128_t *data, int reg)
 {
-	emulator_get_fpu();
+	unsigned long flags;
+
+	flags = emulator_get_fpu();
 	switch (reg) {
 	case 0: asm("movdqa %%xmm0, %0" : "=m"(*data)); break;
 	case 1: asm("movdqa %%xmm1, %0" : "=m"(*data)); break;
@@ -1114,13 +1120,15 @@ static void read_sse_reg(struct x86_emulate_ctxt *ctxt, sse128_t *data, int reg)
 #endif
 	default: BUG();
 	}
-	emulator_put_fpu();
+	emulator_put_fpu(flags);
 }
 
 static void write_sse_reg(struct x86_emulate_ctxt *ctxt, sse128_t *data,
 			  int reg)
 {
-	emulator_get_fpu();
+	unsigned long flags;
+
+	flags = emulator_get_fpu();
 	switch (reg) {
 	case 0: asm("movdqa %0, %%xmm0" : : "m"(*data)); break;
 	case 1: asm("movdqa %0, %%xmm1" : : "m"(*data)); break;
@@ -1142,12 +1150,14 @@ static void write_sse_reg(struct x86_emulate_ctxt *ctxt, sse128_t *data,
 #endif
 	default: BUG();
 	}
-	emulator_put_fpu();
+	emulator_put_fpu(flags);
 }
 
 static void read_mmx_reg(struct x86_emulate_ctxt *ctxt, u64 *data, int reg)
 {
-	emulator_get_fpu();
+	unsigned long flags;
+
+	flags = emulator_get_fpu();
 	switch (reg) {
 	case 0: asm("movq %%mm0, %0" : "=m"(*data)); break;
 	case 1: asm("movq %%mm1, %0" : "=m"(*data)); break;
@@ -1159,12 +1169,14 @@ static void read_mmx_reg(struct x86_emulate_ctxt *ctxt, u64 *data, int reg)
 	case 7: asm("movq %%mm7, %0" : "=m"(*data)); break;
 	default: BUG();
 	}
-	emulator_put_fpu();
+	emulator_put_fpu(flags);
 }
 
 static void write_mmx_reg(struct x86_emulate_ctxt *ctxt, u64 *data, int reg)
 {
-	emulator_get_fpu();
+	unsigned long flags;
+
+	flags = emulator_get_fpu();
 	switch (reg) {
 	case 0: asm("movq %0, %%mm0" : : "m"(*data)); break;
 	case 1: asm("movq %0, %%mm1" : : "m"(*data)); break;
@@ -1176,30 +1188,33 @@ static void write_mmx_reg(struct x86_emulate_ctxt *ctxt, u64 *data, int reg)
 	case 7: asm("movq %0, %%mm7" : : "m"(*data)); break;
 	default: BUG();
 	}
-	emulator_put_fpu();
+	emulator_put_fpu(flags);
 }
 
 static int em_fninit(struct x86_emulate_ctxt *ctxt)
 {
+	unsigned long flags;
+
 	if (ctxt->ops->get_cr(ctxt, 0) & (X86_CR0_TS | X86_CR0_EM))
 		return emulate_nm(ctxt);
 
-	emulator_get_fpu();
+	flags = emulator_get_fpu();
 	asm volatile("fninit");
-	emulator_put_fpu();
+	emulator_put_fpu(flags);
 	return X86EMUL_CONTINUE;
 }
 
 static int em_fnstcw(struct x86_emulate_ctxt *ctxt)
 {
+	unsigned long flags;
 	u16 fcw;
 
 	if (ctxt->ops->get_cr(ctxt, 0) & (X86_CR0_TS | X86_CR0_EM))
 		return emulate_nm(ctxt);
 
-	emulator_get_fpu();
+	flags = emulator_get_fpu();
 	asm volatile("fnstcw %0": "+m"(fcw));
-	emulator_put_fpu();
+	emulator_put_fpu(flags);
 
 	ctxt->dst.val = fcw;
 
@@ -1208,14 +1223,15 @@ static int em_fnstcw(struct x86_emulate_ctxt *ctxt)
 
 static int em_fnstsw(struct x86_emulate_ctxt *ctxt)
 {
+	unsigned long flags;
 	u16 fsw;
 
 	if (ctxt->ops->get_cr(ctxt, 0) & (X86_CR0_TS | X86_CR0_EM))
 		return emulate_nm(ctxt);
 
-	emulator_get_fpu();
+	flags = emulator_get_fpu();
 	asm volatile("fnstsw %0": "+m"(fsw));
-	emulator_put_fpu();
+	emulator_put_fpu(flags);
 
 	ctxt->dst.val = fsw;
 
@@ -4117,17 +4133,18 @@ static inline size_t fxstate_size(struct x86_emulate_ctxt *ctxt)
 static int em_fxsave(struct x86_emulate_ctxt *ctxt)
 {
 	struct fxregs_state fx_state;
+	unsigned long flags;
 	int rc;
 
 	rc = check_fxsr(ctxt);
 	if (rc != X86EMUL_CONTINUE)
 		return rc;
 
-	emulator_get_fpu();
+	flags = emulator_get_fpu();
 
 	rc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_state));
 
-	emulator_put_fpu();
+	emulator_put_fpu(flags);
 
 	if (rc != X86EMUL_CONTINUE)
 		return rc;
@@ -4159,6 +4176,7 @@ static noinline int fxregs_fixup(struct fxregs_state *fx_state,
 static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
 {
 	struct fxregs_state fx_state;
+	unsigned long flags;
 	int rc;
 	size_t size;
 
@@ -4171,7 +4189,7 @@ static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
 	if (rc != X86EMUL_CONTINUE)
 		return rc;
 
-	emulator_get_fpu();
+	flags = emulator_get_fpu();
 
 	if (size < __fxstate_size(16)) {
 		rc = fxregs_fixup(&fx_state, size);
@@ -4188,7 +4206,7 @@ static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
 		rc = asm_safe("fxrstor %[fx]", : [fx] "m"(fx_state));
 
 out:
-	emulator_put_fpu();
+	emulator_put_fpu(flags);
 
 	return rc;
 }
@@ -5503,11 +5521,12 @@ static bool string_insn_completed(struct x86_emulate_ctxt *ctxt)
 
 static int flush_pending_x87_faults(struct x86_emulate_ctxt *ctxt)
 {
+	unsigned long flags;
 	int rc;
 
-	emulator_get_fpu();
+	flags = emulator_get_fpu();
 	rc = asm_safe("fwait");
-	emulator_put_fpu();
+	emulator_put_fpu(flags);
 
 	if (unlikely(rc != X86EMUL_CONTINUE))
 		return emulate_exception(ctxt, MF_VECTOR, 0, false);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c79c1a07f44b..0df921e54689 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5703,7 +5703,7 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 	 */
 	x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
 
-	local_irq_enable();
+	hard_local_irq_enable();
 
 	asm volatile (
 		"push %%" _ASM_BP "; \n\t"
@@ -5827,7 +5827,7 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	reload_tss(vcpu);
 
-	local_irq_disable();
+	hard_local_irq_disable();
 
 	x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl);
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2a1ed3aae100..c0e802771656 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1217,19 +1217,23 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
 #ifdef CONFIG_X86_64
 static u64 vmx_read_guest_kernel_gs_base(struct vcpu_vmx *vmx)
 {
-	preempt_disable();
+	unsigned long flags;
+
+	flags = hard_preempt_disable();
 	if (vmx->guest_state_loaded)
 		rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
-	preempt_enable();
+	hard_preempt_enable(flags);
 	return vmx->msr_guest_kernel_gs_base;
 }
 
 static void vmx_write_guest_kernel_gs_base(struct vcpu_vmx *vmx, u64 data)
 {
-	preempt_disable();
+	unsigned long flags;
+
+	flags = hard_preempt_disable();
 	if (vmx->guest_state_loaded)
 		wrmsrl(MSR_KERNEL_GS_BASE, data);
-	preempt_enable();
+	hard_preempt_enable(flags);
 	vmx->msr_guest_kernel_gs_base = data;
 }
 #endif
@@ -1657,6 +1661,7 @@ static void setup_msrs(struct vcpu_vmx *vmx)
 {
 	int save_nmsrs, index;
 
+	hard_cond_local_irq_disable();
 	save_nmsrs = 0;
 #ifdef CONFIG_X86_64
 	/*
@@ -1684,6 +1689,7 @@ static void setup_msrs(struct vcpu_vmx *vmx)
 
 	vmx->save_nmsrs = save_nmsrs;
 	vmx->guest_msrs_ready = false;
+	hard_cond_local_irq_enable();
 
 	if (cpu_has_vmx_msr_bitmap())
 		vmx_update_msr_bitmap(&vmx->vcpu);
@@ -2159,9 +2165,22 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			u64 old_msr_data = msr->data;
 			msr->data = data;
 			if (msr - vmx->guest_msrs < vmx->save_nmsrs) {
+				unsigned long flags;
+
 				preempt_disable();
+				flags = hard_cond_local_irq_save();
+				/*
+				 * This may be called without a ipipe notifier
+				 * registered, i.e. outside of vcpu_run. In
+				 * that case, shared MSRs may be set to guest
+				 * state while I-pipe will have no chance to
+				 * restore them when interrupting afterwards.
+				 * Therefore register the notifier.
+				 */
+				__ipipe_enter_vm(&vcpu->ipipe_notifier);
 				ret = kvm_set_shared_msr(msr->index, msr->data,
 							 msr->mask);
+				hard_cond_local_irq_restore(flags);
 				preempt_enable();
 				if (ret)
 					msr->data = old_msr_data;
@@ -6754,7 +6773,9 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 	vmx_vcpu_load(&vmx->vcpu, cpu);
 	vmx->vcpu.cpu = cpu;
 	vmx_vcpu_setup(vmx);
+	hard_cond_local_irq_disable();
 	vmx_vcpu_put(&vmx->vcpu);
+	hard_cond_local_irq_enable();
 	put_cpu();
 	if (cpu_need_virtualize_apic_accesses(&vmx->vcpu)) {
 		err = alloc_apic_access_page(kvm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 12e83297ea02..8072c26e88a8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -39,6 +39,7 @@
 #include <linux/iommu.h>
 #include <linux/intel-iommu.h>
 #include <linux/cpufreq.h>
+#include <linux/ipipe.h>
 #include <linux/user-return-notifier.h>
 #include <linux/srcu.h>
 #include <linux/slab.h>
@@ -169,6 +170,7 @@ struct kvm_shared_msrs_global {
 struct kvm_shared_msrs {
 	struct user_return_notifier urn;
 	bool registered;
+	bool dirty;
 	struct kvm_shared_msr_values {
 		u64 host;
 		u64 curr;
@@ -235,12 +237,31 @@ static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
 		vcpu->arch.apf.gfns[i] = ~0;
 }
 
+static void kvm_restore_shared_msrs(struct kvm_shared_msrs *locals)
+{
+	struct kvm_shared_msr_values *values;
+	unsigned long flags;
+	unsigned int slot;
+
+	flags = hard_cond_local_irq_save();
+	if (locals->dirty) {
+		for (slot = 0; slot < shared_msrs_global.nr; ++slot) {
+			values = &locals->values[slot];
+			if (values->host != values->curr) {
+				wrmsrl(shared_msrs_global.msrs[slot],
+				       values->host);
+				values->curr = values->host;
+			}
+		}
+		locals->dirty = false;
+	}
+	hard_cond_local_irq_restore(flags);
+}
+
 static void kvm_on_user_return(struct user_return_notifier *urn)
 {
-	unsigned slot;
 	struct kvm_shared_msrs *locals
 		= container_of(urn, struct kvm_shared_msrs, urn);
-	struct kvm_shared_msr_values *values;
 	unsigned long flags;
 
 	/*
@@ -253,13 +274,8 @@ static void kvm_on_user_return(struct user_return_notifier *urn)
 		user_return_notifier_unregister(urn);
 	}
 	local_irq_restore(flags);
-	for (slot = 0; slot < shared_msrs_global.nr; ++slot) {
-		values = &locals->values[slot];
-		if (values->host != values->curr) {
-			wrmsrl(shared_msrs_global.msrs[slot], values->host);
-			values->curr = values->host;
-		}
-	}
+	kvm_restore_shared_msrs(locals);
+	__ipipe_exit_vm();
 }
 
 static void shared_msr_update(unsigned slot, u32 msr)
@@ -310,6 +326,7 @@ int kvm_set_shared_msr(unsigned slot, u64 value, u64 mask)
 		return 1;
 
 	smsr->values[slot].curr = value;
+	smsr->dirty = true;
 	if (!smsr->registered) {
 		smsr->urn.on_user_return = kvm_on_user_return;
 		user_return_notifier_register(&smsr->urn);
@@ -3568,11 +3585,23 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	unsigned long flags;
 	int idx;
 
 	if (vcpu->preempted)
 		vcpu->arch.preempted_in_kernel = !kvm_x86_ops->get_cpl(vcpu);
 
+	flags = hard_cond_local_irq_save();
+
+	/*
+	 * Do not update steal time accounting while running over the head
+	 * domain as this may introduce high latencies and will also issue
+	 * context violation reports. The code will be executed when kvm does
+	 * the regular kvm_arch_vcpu_put, after returning from the head domain.
+	 */
+	if (!ipipe_root_p)
+		goto skip_steal_time_update;
+
 	/*
 	 * Disable page faults because we're in atomic context here.
 	 * kvm_write_guest_offset_cached() would call might_fault()
@@ -3590,6 +3619,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	kvm_steal_time_set_preempted(vcpu);
 	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 	pagefault_enable();
+skip_steal_time_update:
 	kvm_x86_ops->vcpu_put(vcpu);
 	vcpu->arch.last_host_tsc = rdtsc();
 	/*
@@ -3598,7 +3628,42 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	 * guest. do_debug expects dr6 to be cleared after it runs, do the same.
 	 */
 	set_debugreg(0, 6);
+
+#ifdef CONFIG_IPIPE
+	vcpu->ipipe_put_vcpu = false;
+	if (!per_cpu_ptr(shared_msrs, smp_processor_id())->dirty)
+		__ipipe_exit_vm();
+#endif
+
+	hard_cond_local_irq_restore(flags);
+}
+
+#ifdef CONFIG_IPIPE
+
+void __ipipe_handle_vm_preemption(struct ipipe_vm_notifier *nfy)
+{
+	unsigned int cpu = raw_smp_processor_id();
+	struct kvm_shared_msrs *smsr = per_cpu_ptr(shared_msrs, cpu);
+	struct kvm_vcpu *vcpu;
+
+	vcpu = container_of(nfy, struct kvm_vcpu, ipipe_notifier);
+
+	/*
+	 * We may leave kvm_arch_vcpu_put with the ipipe notifier still
+	 * registered in case shared MSRs are still active. If a VM preemption
+	 * hits us after that point but before the user return notifier fired,
+	 * we may run kvm_arch_vcpu_put again from here. Do not rely on this
+	 * being harmless and rather use a flag to decide if the run is needed.
+	 */
+	if (vcpu->ipipe_put_vcpu)
+		kvm_arch_vcpu_put(vcpu);
+
+	kvm_restore_shared_msrs(smsr);
+	__ipipe_exit_vm();
 }
+EXPORT_SYMBOL_GPL(__ipipe_handle_vm_preemption);
+
+#endif
 
 static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
 				    struct kvm_lapic_state *s)
@@ -8189,6 +8254,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	}
 
 	preempt_disable();
+	local_irq_disable();
+	hard_cond_local_irq_disable();
+
+#ifdef CONFIG_IPIPE
+	__ipipe_enter_vm(&vcpu->ipipe_notifier);
+	vcpu->ipipe_put_vcpu = true;
+#endif
 
 	kvm_x86_ops->prepare_guest_switch(vcpu);
 
@@ -8197,7 +8269,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * IPI are then delayed after guest entry, which ensures that they
 	 * result in virtual interrupt delivery.
 	 */
-	local_irq_disable();
 	vcpu->mode = IN_GUEST_MODE;
 
 	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
@@ -8227,6 +8298,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	    || need_resched() || signal_pending(current)) {
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		smp_wmb();
+		hard_cond_local_irq_enable();
 		local_irq_enable();
 		preempt_enable();
 		vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
@@ -8293,6 +8365,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	kvm_x86_ops->handle_exit_irqoff(vcpu);
 
+	hard_cond_local_irq_enable();
+
 	/*
 	 * Consume any pending interrupts, including the possible source of
 	 * VM-Exit on SVM and any ticks that occur between VM-Exit and now.
@@ -8535,7 +8609,9 @@ static void kvm_save_current_fpu(struct fpu *fpu)
 /* Swap (qemu) user FPU context for the guest FPU context. */
 static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	fpregs_lock();
+	unsigned long flags;
+
+	flags = fpregs_lock();
 
 	kvm_save_current_fpu(vcpu->arch.user_fpu);
 
@@ -8544,7 +8620,7 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 				~XFEATURE_MASK_PKRU);
 
 	fpregs_mark_activate();
-	fpregs_unlock();
+	fpregs_unlock(flags);
 
 	trace_kvm_fpu(1);
 }
@@ -8552,14 +8628,16 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 /* When vcpu_run ends, restore user space FPU context. */
 static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	fpregs_lock();
+	unsigned long flags;
+
+	flags = fpregs_lock();
 
 	kvm_save_current_fpu(vcpu->arch.guest_fpu);
 
 	copy_kernel_to_fpregs(&vcpu->arch.user_fpu->state);
 
 	fpregs_mark_activate();
-	fpregs_unlock();
+	fpregs_unlock(flags);
 
 	++vcpu->stat.fpu_reload;
 	trace_kvm_fpu(0);
@@ -9169,6 +9247,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
 		"guest TSC will not be reliable\n");
 
 	vcpu = kvm_x86_ops->vcpu_create(kvm, id);
+#ifdef CONFIG_IPIPE
+	vcpu->ipipe_notifier.handler = __ipipe_handle_vm_preemption;
+#endif
 
 	return vcpu;
 }
diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c
index 4321fa02e18d..1d9ad8f0bc4b 100644
--- a/arch/x86/lib/mmx_32.c
+++ b/arch/x86/lib/mmx_32.c
@@ -31,7 +31,7 @@ void *_mmx_memcpy(void *to, const void *from, size_t len)
 	void *p;
 	int i;
 
-	if (unlikely(in_interrupt()))
+	if (unlikely(!ipipe_root_p || in_interrupt()))
 		return __memcpy(to, from, len);
 
 	p = to;
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index 3f435d7fca5e..1168c90acd88 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -5,6 +5,7 @@
  */
 
 #include <linux/uaccess.h>
+#include <linux/ipipe.h>
 #include <linux/export.h>
 
 #include <asm/tlbflush.h>
@@ -18,7 +19,7 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 {
 	unsigned long ret;
 
-	if (__range_not_ok(from, n, TASK_SIZE))
+	if (!ipipe_root_p || __range_not_ok(from, n, TASK_SIZE))
 		return n;
 
 	if (!nmi_uaccess_okay())
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index c494c8c05824..0012982fb176 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1516,6 +1516,12 @@ static noinline void
 __do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
 		unsigned long address)
 {
+#ifdef CONFIG_IPIPE
+	if (ipipe_root_domain != ipipe_head_domain) {
+		trace_hardirqs_on();
+		hard_local_irq_enable();
+	}
+#endif
 	prefetchw(&current->mm->mmap_sem);
 
 	if (unlikely(kmmio_fault(regs, address)))
@@ -1553,3 +1559,50 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long addr
 	exception_exit(prev_state);
 }
 NOKPROBE_SYMBOL(do_page_fault);
+
+#ifdef CONFIG_IPIPE
+
+void __ipipe_pin_mapping_globally(unsigned long start, unsigned long end)
+{
+#ifdef CONFIG_X86_32
+	unsigned long next, addr = start;
+
+	do {
+		unsigned long flags;
+		struct page *page;
+
+		next = pgd_addr_end(addr, end);
+		spin_lock_irqsave(&pgd_lock, flags);
+		list_for_each_entry(page, &pgd_list, lru)
+			vmalloc_sync_one(page_address(page), addr);
+		spin_unlock_irqrestore(&pgd_lock, flags);
+
+	} while (addr = next, addr != end);
+#else
+	unsigned long next, addr = start;
+	pgd_t *pgd, *pgd_ref;
+	struct page *page;
+
+	if (!(start >= VMALLOC_START && start < VMALLOC_END))
+		return;
+
+	do {
+		next = pgd_addr_end(addr, end);
+		pgd_ref = pgd_offset_k(addr);
+		if (pgd_none(*pgd_ref))
+			continue;
+		spin_lock(&pgd_lock);
+		list_for_each_entry(page, &pgd_list, lru) {
+			pgd = page_address(page) + pgd_index(addr);
+			if (pgd_none(*pgd))
+				set_pgd(pgd, *pgd_ref);
+		}
+		spin_unlock(&pgd_lock);
+		addr = next;
+	} while (addr != end);
+
+	arch_flush_lazy_mmu_mode();
+#endif
+}
+
+#endif /* CONFIG_IPIPE */
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index d9fbd4f69920..9ab41ba80af8 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -772,6 +772,7 @@ void io_free_memtype(resource_size_t start, resource_size_t end)
 	free_memtype(start, end);
 }
 
+#ifdef CONFIG_X86_PAT
 int arch_io_reserve_memtype_wc(resource_size_t start, resource_size_t size)
 {
 	enum page_cache_mode type = _PAGE_CACHE_MODE_WC;
@@ -785,6 +786,7 @@ void arch_io_free_memtype_wc(resource_size_t start, resource_size_t size)
 	io_free_memtype(start, start + size);
 }
 EXPORT_SYMBOL(arch_io_free_memtype_wc);
+#endif
 
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 				unsigned long size, pgprot_t vma_prot)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index e6a9edc5baaf..67d6f3955d32 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -156,9 +156,9 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 {
 	unsigned long flags;
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 	switch_mm_irqs_off(prev, next, tsk);
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 static void sync_current_stack_to_mm(struct mm_struct *mm)
@@ -278,7 +278,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
 	u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
 	bool was_lazy = this_cpu_read(cpu_tlbstate.is_lazy);
-	unsigned cpu = smp_processor_id();
+	unsigned cpu = raw_smp_processor_id();
 	u64 next_tlb_gen;
 	bool need_flush;
 	u16 new_asid;
@@ -292,8 +292,11 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 * NB: leave_mm() calls us with prev == NULL and tsk == NULL.
 	 */
 
+	WARN_ON_ONCE(IS_ENABLED(CONFIG_IPIPE_DEBUG_INTERNAL) &&
+		     !hard_irqs_disabled());
+
 	/* We don't want flush_tlb_func_* to run concurrently with us. */
-	if (IS_ENABLED(CONFIG_PROVE_LOCKING))
+	if (!IS_ENABLED(CONFIG_IPIPE) && IS_ENABLED(CONFIG_PROVE_LOCKING))
 		WARN_ON_ONCE(!irqs_disabled());
 
 	/*
@@ -530,16 +533,27 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
 	 * - f->new_tlb_gen: the generation that the requester of the flush
 	 *                   wants us to catch up to.
 	 */
-	struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
-	u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
-	u64 mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
-	u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
+	struct mm_struct *loaded_mm;
+	u32 loaded_mm_asid;
+	u64 mm_tlb_gen;
+	u64 local_tlb_gen;
+	unsigned long flags;
 
 	/* This code cannot presently handle being reentered. */
 	VM_WARN_ON(!irqs_disabled());
 
-	if (unlikely(loaded_mm == &init_mm))
+	flags = hard_cond_local_irq_save();
+
+	loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+	loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+	mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
+	loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+	local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
+
+	if (unlikely(loaded_mm == &init_mm)) {
+		hard_cond_local_irq_restore(flags);
 		return;
+	}
 
 	VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id) !=
 		   loaded_mm->context.ctx_id);
@@ -555,10 +569,12 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
 		 * IPIs to lazy TLB mode CPUs.
 		 */
 		switch_mm_irqs_off(NULL, &init_mm, NULL);
+		hard_cond_local_irq_restore(flags);
 		return;
 	}
 
 	if (unlikely(local_tlb_gen == mm_tlb_gen)) {
+		hard_cond_local_irq_restore(flags);
 		/*
 		 * There's nothing to do: we're already up to date.  This can
 		 * happen if two concurrent flushes happen -- the first flush to
@@ -572,6 +588,8 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
 	WARN_ON_ONCE(local_tlb_gen > mm_tlb_gen);
 	WARN_ON_ONCE(f->new_tlb_gen > mm_tlb_gen);
 
+	hard_cond_local_irq_restore(flags);
+
 	/*
 	 * If we get to this point, we know that our TLB is out of date.
 	 * This does not strictly imply that we need to flush (it's
@@ -631,8 +649,13 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
 		trace_tlb_flush(reason, TLB_FLUSH_ALL);
 	}
 
-	/* Both paths above update our state to mm_tlb_gen. */
-	this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb_gen);
+	flags = hard_cond_local_irq_save();
+	if (loaded_mm == this_cpu_read(cpu_tlbstate.loaded_mm) &&
+	    loaded_mm_asid == this_cpu_read(cpu_tlbstate.loaded_mm_asid)) {
+		/* Both paths above update our state to mm_tlb_gen. */
+		this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb_gen);
+	}
+	hard_cond_local_irq_restore(flags);
 }
 
 static void flush_tlb_func_local(const void *info, enum tlb_flush_reason reason)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index ddfbd62d8bfc..21c46ca878b8 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3332,6 +3332,17 @@ EXPORT_SYMBOL(dev_printk_emit);
 static void __dev_printk(const char *level, const struct device *dev,
 			struct va_format *vaf)
 {
+#ifdef CONFIG_IPIPE
+	/*
+	 * Console logging only if hard locked, or over the head
+	 * stage.
+	 */
+	if (hard_irqs_disabled() || !ipipe_root_p) {
+		__ipipe_log_printk(vaf->fmt, *vaf->va);
+		return;
+	}
+#endif
+
 	if (dev)
 		dev_printk_emit(level[1] - '0', dev, "%s %s: %pV",
 				dev_driver_string(dev), dev_name(dev), vaf);
diff --git a/drivers/base/regmap/regmap-irq.c b/drivers/base/regmap/regmap-irq.c
index 3d64c9331a82..addf18030b95 100644
--- a/drivers/base/regmap/regmap-irq.c
+++ b/drivers/base/regmap/regmap-irq.c
@@ -215,6 +215,7 @@ static void regmap_irq_enable(struct irq_data *data)
 	struct regmap *map = d->map;
 	const struct regmap_irq *irq_data = irq_to_regmap_irq(d, data->hwirq);
 	unsigned int mask, type;
+	unsigned long flags;
 
 	type = irq_data->type.type_falling_val | irq_data->type.type_rising_val;
 
@@ -237,7 +238,9 @@ static void regmap_irq_enable(struct irq_data *data)
 	if (d->chip->clear_on_unmask)
 		d->clear_status = true;
 
+	flags = hard_cond_local_irq_save();
 	d->mask_buf[irq_data->reg_offset / map->reg_stride] &= ~mask;
+	hard_cond_local_irq_restore(flags);
 }
 
 static void regmap_irq_disable(struct irq_data *data)
@@ -245,8 +248,11 @@ static void regmap_irq_disable(struct irq_data *data)
 	struct regmap_irq_chip_data *d = irq_data_get_irq_chip_data(data);
 	struct regmap *map = d->map;
 	const struct regmap_irq *irq_data = irq_to_regmap_irq(d, data->hwirq);
+	unsigned long flags;
 
+	flags = hard_cond_local_irq_save();
 	d->mask_buf[irq_data->reg_offset / map->reg_stride] |= irq_data->mask;
+	hard_cond_local_irq_restore(flags);
 }
 
 static int regmap_irq_set_type(struct irq_data *data, unsigned int type)
@@ -324,6 +330,7 @@ static const struct irq_chip regmap_irq_chip = {
 	.irq_enable		= regmap_irq_enable,
 	.irq_set_type		= regmap_irq_set_type,
 	.irq_set_wake		= regmap_irq_set_wake,
+	.flags			= IRQCHIP_PIPELINE_SAFE,
 };
 
 static inline int read_sub_irq_data(struct regmap_irq_chip_data *data,
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 4be83b4de2a0..206c40ad88ec 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -17,6 +17,8 @@
 #include <linux/clockchips.h>
 #include <linux/clocksource.h>
 #include <linux/interrupt.h>
+#include <linux/ipipe.h>
+#include <linux/ipipe_tickdev.h>
 #include <linux/of_irq.h>
 #include <linux/of_address.h>
 #include <linux/io.h>
@@ -631,8 +633,7 @@ static bool arch_timer_counter_has_wa(void)
 #define arch_timer_counter_has_wa()			({false;})
 #endif /* CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND */
 
-static __always_inline irqreturn_t timer_handler(const int access,
-					struct clock_event_device *evt)
+static int arch_timer_ack(const int access, struct clock_event_device *evt)
 {
 	unsigned long ctrl;
 
@@ -640,6 +641,52 @@ static __always_inline irqreturn_t timer_handler(const int access,
 	if (ctrl & ARCH_TIMER_CTRL_IT_STAT) {
 		ctrl |= ARCH_TIMER_CTRL_IT_MASK;
 		arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, evt);
+		return 1;
+	}
+	return 0;
+}
+
+#ifdef CONFIG_IPIPE
+static DEFINE_PER_CPU(struct ipipe_timer, arch_itimer);
+static struct __ipipe_tscinfo tsc_info = {
+	.type = IPIPE_TSC_TYPE_FREERUNNING_ARCH,
+	.u = {
+		{
+			.mask = 0xffffffffffffffff,
+		},
+	},
+};
+
+static void arch_itimer_ack_phys(void)
+{
+	struct clock_event_device *evt = this_cpu_ptr(arch_timer_evt);
+	arch_timer_ack(ARCH_TIMER_PHYS_ACCESS, evt);
+}
+
+static void arch_itimer_ack_virt(void)
+{
+	struct clock_event_device *evt = this_cpu_ptr(arch_timer_evt);
+	arch_timer_ack(ARCH_TIMER_VIRT_ACCESS, evt);
+}
+#endif /* CONFIG_IPIPE */
+
+static inline irqreturn_t timer_handler(int irq, const int access,
+					struct clock_event_device *evt)
+{
+	if (clockevent_ipipe_stolen(evt))
+		goto stolen;
+
+	if (arch_timer_ack(access, evt)) {
+#ifdef CONFIG_IPIPE
+		struct ipipe_timer *itimer = raw_cpu_ptr(&arch_itimer);
+		if (itimer->irq != irq)
+			itimer->irq = irq;
+#endif /* CONFIG_IPIPE */
+	  stolen:
+		/*
+		 * This is a 64bit clock source, no need for TSC
+		 * update.
+		 */
 		evt->event_handler(evt);
 		return IRQ_HANDLED;
 	}
@@ -651,28 +698,28 @@ static irqreturn_t arch_timer_handler_virt(int irq, void *dev_id)
 {
 	struct clock_event_device *evt = dev_id;
 
-	return timer_handler(ARCH_TIMER_VIRT_ACCESS, evt);
+	return timer_handler(irq, ARCH_TIMER_VIRT_ACCESS, evt);
 }
 
 static irqreturn_t arch_timer_handler_phys(int irq, void *dev_id)
 {
 	struct clock_event_device *evt = dev_id;
 
-	return timer_handler(ARCH_TIMER_PHYS_ACCESS, evt);
+	return timer_handler(irq, ARCH_TIMER_PHYS_ACCESS, evt);
 }
 
 static irqreturn_t arch_timer_handler_phys_mem(int irq, void *dev_id)
 {
 	struct clock_event_device *evt = dev_id;
 
-	return timer_handler(ARCH_TIMER_MEM_PHYS_ACCESS, evt);
+	return timer_handler(irq, ARCH_TIMER_MEM_PHYS_ACCESS, evt);
 }
 
 static irqreturn_t arch_timer_handler_virt_mem(int irq, void *dev_id)
 {
 	struct clock_event_device *evt = dev_id;
 
-	return timer_handler(ARCH_TIMER_MEM_VIRT_ACCESS, evt);
+	return timer_handler(irq, ARCH_TIMER_MEM_VIRT_ACCESS, evt);
 }
 
 static __always_inline int timer_shutdown(const int access,
@@ -756,6 +803,18 @@ static void __arch_timer_setup(unsigned type,
 
 		arch_timer_check_ool_workaround(ate_match_local_cap_id, NULL);
 
+#ifdef CONFIG_IPIPE
+		clk->ipipe_timer = raw_cpu_ptr(&arch_itimer);
+		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) {
+			clk->ipipe_timer->irq = arch_timer_ppi[ARCH_TIMER_VIRT_PPI];
+			clk->ipipe_timer->ack = arch_itimer_ack_virt;
+		} else {
+			clk->ipipe_timer->irq = arch_timer_ppi[ARCH_TIMER_PHYS_SECURE_PPI];
+			clk->ipipe_timer->ack = arch_itimer_ack_phys;
+		}
+		clk->ipipe_timer->freq = arch_timer_rate;
+#endif
+
 		if (arch_timer_c3stop)
 			clk->features |= CLOCK_EVT_FEAT_C3STOP;
 		clk->name = "arch_sys_timer";
@@ -851,6 +910,9 @@ static void arch_counter_set_user_access(void)
 	else
 		cntkctl |= ARCH_TIMER_USR_VCT_ACCESS_EN;
 
+#ifdef CONFIG_IPIPE
+	cntkctl |= ARCH_TIMER_USR_PCT_ACCESS_EN;
+#endif
 	arch_timer_set_cntkctl(cntkctl);
 }
 
@@ -995,6 +1057,10 @@ static void __init arch_counter_register(unsigned type)
 		arch_timer_read_counter = arch_counter_get_cntvct_mem;
 	}
 
+#ifdef CONFIG_IPIPE
+	tsc_info.freq = arch_timer_rate;
+	__ipipe_tsc_register(&tsc_info);
+#endif /* CONFIG_IPIPE */
 	if (!arch_counter_suspend_stop)
 		clocksource_counter.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
 	start_count = arch_timer_read_counter();
diff --git a/drivers/clocksource/arm_global_timer.c b/drivers/clocksource/arm_global_timer.c
index 88b2d38a7a61..cc0a8bc7f7e5 100644
--- a/drivers/clocksource/arm_global_timer.c
+++ b/drivers/clocksource/arm_global_timer.c
@@ -20,6 +20,7 @@
 #include <linux/of_irq.h>
 #include <linux/of_address.h>
 #include <linux/sched_clock.h>
+#include <linux/ipipe_tickdev.h>
 
 #include <asm/cputype.h>
 
@@ -46,10 +47,69 @@
  * the units for all operations.
  */
 static void __iomem *gt_base;
+static unsigned long gt_pbase;
+static struct clk *gt_clk;
 static unsigned long gt_clk_rate;
 static int gt_ppi;
 static struct clock_event_device __percpu *gt_evt;
 
+#ifdef CONFIG_IPIPE
+
+static struct clocksource gt_clocksource;
+
+static int gt_clockevent_ack(struct clock_event_device *evt);
+
+static DEFINE_PER_CPU(struct ipipe_timer, gt_itimer);
+
+static unsigned int refresh_gt_freq(void)
+{
+	gt_clk_rate = clk_get_rate(gt_clk);
+
+	__clocksource_update_freq_hz(&gt_clocksource, gt_clk_rate);
+
+	return gt_clk_rate;
+}
+
+static inline void gt_ipipe_cs_setup(void)
+{
+	struct __ipipe_tscinfo tsc_info = {
+		.type = IPIPE_TSC_TYPE_FREERUNNING,
+		.freq = gt_clk_rate,
+		.counter_vaddr = (unsigned long)gt_base,
+		.u = {
+			{
+				.counter_paddr = gt_pbase,
+				.mask = 0xffffffff,
+			}
+		},
+		.refresh_freq = refresh_gt_freq,
+	};
+
+	__ipipe_tsc_register(&tsc_info);
+}
+
+static void gt_itimer_ack(void)
+{
+	struct clock_event_device *evt = this_cpu_ptr(gt_evt);
+	gt_clockevent_ack(evt);
+}
+
+static inline void gt_ipipe_evt_setup(struct clock_event_device *evt)
+{
+	evt->ipipe_timer = this_cpu_ptr(&gt_itimer);
+	evt->ipipe_timer->irq = evt->irq;
+	evt->ipipe_timer->ack = gt_itimer_ack;
+	evt->ipipe_timer->freq = gt_clk_rate;
+}
+
+#else
+
+static inline void gt_ipipe_cs_setup(void) { }
+
+static inline void gt_ipipe_evt_setup(struct clock_event_device *evt) { }
+
+#endif /* CONFIG_IPIPE */
+
 /*
  * To get the value from the Global Timer Counter register proceed as follows:
  * 1. Read the upper 32-bit timer counter register
@@ -134,13 +194,11 @@ static int gt_clockevent_set_next_event(unsigned long evt,
 	return 0;
 }
 
-static irqreturn_t gt_clockevent_interrupt(int irq, void *dev_id)
+static int gt_clockevent_ack(struct clock_event_device *evt)
 {
-	struct clock_event_device *evt = dev_id;
-
 	if (!(readl_relaxed(gt_base + GT_INT_STATUS) &
 				GT_INT_STATUS_EVENT_FLAG))
-		return IRQ_NONE;
+		return IS_ENABLED(CONFIG_IPIPE);
 
 	/**
 	 * ERRATA 740657( Global Timer can send 2 interrupts for
@@ -153,10 +211,23 @@ static irqreturn_t gt_clockevent_interrupt(int irq, void *dev_id)
 	 *	the Global Timer flag _after_ having incremented
 	 *	the Comparator register	value to a higher value.
 	 */
-	if (clockevent_state_oneshot(evt))
+	if (clockevent_ipipe_stolen(evt) || clockevent_state_oneshot(evt))
 		gt_compare_set(ULONG_MAX, 0);
 
 	writel_relaxed(GT_INT_STATUS_EVENT_FLAG, gt_base + GT_INT_STATUS);
+
+	return 1;
+}
+
+static irqreturn_t gt_clockevent_interrupt(int irq, void *dev_id)
+{
+	struct clock_event_device *evt = dev_id;
+
+	if (!clockevent_ipipe_stolen(evt)) {
+		if (!gt_clockevent_ack(evt))
+			return IRQ_NONE;
+	}
+
 	evt->event_handler(evt);
 
 	return IRQ_HANDLED;
@@ -177,6 +248,7 @@ static int gt_starting_cpu(unsigned int cpu)
 	clk->cpumask = cpumask_of(cpu);
 	clk->rating = 300;
 	clk->irq = gt_ppi;
+	gt_ipipe_evt_setup(clk);
 	clockevents_config_and_register(clk, gt_clk_rate,
 					1, 0xffffffff);
 	enable_percpu_irq(clk->irq, IRQ_TYPE_NONE);
@@ -249,13 +321,14 @@ static int __init gt_clocksource_init(void)
 #ifdef CONFIG_CLKSRC_ARM_GLOBAL_TIMER_SCHED_CLOCK
 	sched_clock_register(gt_sched_clock_read, 64, gt_clk_rate);
 #endif
+	gt_ipipe_cs_setup();
 	return clocksource_register_hz(&gt_clocksource, gt_clk_rate);
 }
 
 static int __init global_timer_of_register(struct device_node *np)
 {
-	struct clk *gt_clk;
 	int err = 0;
+	struct resource res;
 
 	/*
 	 * In A9 r2p0 the comparators for each processor with the global timer
@@ -280,6 +353,11 @@ static int __init global_timer_of_register(struct device_node *np)
 		return -ENXIO;
 	}
 
+	if (of_address_to_resource(np, 0, &res))
+		res.start = 0;
+
+	gt_pbase = res.start;
+
 	gt_clk = of_clk_get(np, 0);
 	if (!IS_ERR(gt_clk)) {
 		err = clk_prepare_enable(gt_clk);
diff --git a/drivers/clocksource/bcm2835_timer.c b/drivers/clocksource/bcm2835_timer.c
index b235f446ee50..24932b779ddd 100644
--- a/drivers/clocksource/bcm2835_timer.c
+++ b/drivers/clocksource/bcm2835_timer.c
@@ -16,6 +16,9 @@
 #include <linux/slab.h>
 #include <linux/string.h>
 #include <linux/sched_clock.h>
+#include <linux/ipipe.h>
+#include <linux/ipipe_tickdev.h>
+#include <linux/time.h>
 
 #include <asm/irq.h>
 
@@ -26,6 +29,7 @@
 #define MAX_TIMER	3
 #define DEFAULT_TIMER	3
 
+
 struct bcm2835_timer {
 	void __iomem *control;
 	void __iomem *compare;
@@ -33,9 +37,53 @@ struct bcm2835_timer {
 	struct clock_event_device evt;
 	struct irqaction act;
 };
-
 static void __iomem *system_clock __read_mostly;
 
+#ifdef CONFIG_IPIPE
+
+static void __iomem *t_base;
+static unsigned long t_pbase;
+
+static inline void bcm2835_ipipe_cs_setup(unsigned int freq)
+{
+	struct __ipipe_tscinfo tsc_info = {
+		.type = IPIPE_TSC_TYPE_FREERUNNING,
+		.freq = freq,
+		.counter_vaddr = (unsigned long)t_base + 0x04,
+		.u = {
+			{
+				.counter_paddr = t_pbase + 0x04,
+				.mask = 0xffffffff,
+			}
+		},
+	};
+
+	__ipipe_tsc_register(&tsc_info);
+}
+
+static struct ipipe_timer bcm2835_itimer;
+
+static void bcm2835_itimer_ack(void)
+{
+	struct bcm2835_timer *timer = container_of(bcm2835_itimer.host_timer,
+							struct bcm2835_timer, evt);
+	writel(timer->match_mask, timer->control);
+}
+
+static inline void bcm2835_ipipe_evt_setup(struct clock_event_device *evt,
+																int freq)
+{
+	evt->ipipe_timer = &bcm2835_itimer;
+	evt->ipipe_timer->irq = evt->irq;
+	evt->ipipe_timer->ack = bcm2835_itimer_ack;
+	evt->ipipe_timer->freq = freq;
+}
+
+#else
+static inline void bcm2835_ipipe_cs_setup(void) { }
+static inline void bcm2835_ipipe_evt_setup(struct clock_event_device *evt) { }
+#endif /* CONFIG_IPIPE */
+
 static u64 notrace bcm2835_sched_read(void)
 {
 	return readl_relaxed(system_clock);
@@ -46,8 +94,7 @@ static int bcm2835_time_set_next_event(unsigned long event,
 {
 	struct bcm2835_timer *timer = container_of(evt_dev,
 		struct bcm2835_timer, evt);
-	writel_relaxed(readl_relaxed(system_clock) + event,
-		timer->compare);
+		writel_relaxed(readl_relaxed(system_clock) + event, timer->compare);
 	return 0;
 }
 
@@ -55,9 +102,13 @@ static irqreturn_t bcm2835_time_interrupt(int irq, void *dev_id)
 {
 	struct bcm2835_timer *timer = dev_id;
 	void (*event_handler)(struct clock_event_device *);
+
+	if (clockevent_ipipe_stolen(&timer->evt)) {
+		goto handle;
+	}
 	if (readl_relaxed(timer->control) & timer->match_mask) {
 		writel_relaxed(timer->match_mask, timer->control);
-
+		handle:
 		event_handler = READ_ONCE(timer->evt.event_handler);
 		if (event_handler)
 			event_handler(&timer->evt);
@@ -80,6 +131,18 @@ static int __init bcm2835_timer_init(struct device_node *node)
 		return -ENXIO;
 	}
 
+	if (IS_ENABLED(CONFIG_IPIPE)) {
+		struct resource res;
+		int ret;
+
+		ret = of_address_to_resource(node, 0, &res);
+		if (ret)
+			res.start = 0;
+
+		t_base = base;
+		t_pbase = res.start;
+	}
+
 	ret = of_property_read_u32(node, "clock-frequency", &freq);
 	if (ret) {
 		pr_err("Can't read clock-frequency\n");
@@ -114,10 +177,21 @@ static int __init bcm2835_timer_init(struct device_node *node)
 	timer->evt.set_next_event = bcm2835_time_set_next_event;
 	timer->evt.cpumask = cpumask_of(0);
 	timer->act.name = node->name;
-	timer->act.flags = IRQF_TIMER | IRQF_SHARED;
+	timer->act.flags = IRQF_TIMER;
 	timer->act.dev_id = timer;
 	timer->act.handler = bcm2835_time_interrupt;
 
+	if (IS_ENABLED(CONFIG_IPIPE)) {
+		bcm2835_ipipe_cs_setup(freq);
+		bcm2835_ipipe_evt_setup(&timer->evt, freq);
+		timer->evt.ipipe_timer = &bcm2835_itimer;
+		timer->evt.ipipe_timer->irq = irq;
+		timer->evt.ipipe_timer->ack = bcm2835_itimer_ack;
+		timer->evt.ipipe_timer->freq = freq;
+	} else {
+		timer->act.flags |= IRQF_SHARED;
+	}
+
 	ret = setup_irq(irq, &timer->act);
 	if (ret) {
 		pr_err("Can't set up timer IRQ\n");
diff --git a/drivers/clocksource/dw_apb_timer.c b/drivers/clocksource/dw_apb_timer.c
index 10ce69548f1b..f1e389689296 100644
--- a/drivers/clocksource/dw_apb_timer.c
+++ b/drivers/clocksource/dw_apb_timer.c
@@ -12,6 +12,7 @@
 #include <linux/kernel.h>
 #include <linux/interrupt.h>
 #include <linux/irq.h>
+#include <linux/ipipe.h>
 #include <linux/io.h>
 #include <linux/slab.h>
 
@@ -382,7 +383,7 @@ static void apbt_restart_clocksource(struct clocksource *cs)
  */
 struct dw_apb_clocksource *
 dw_apb_clocksource_init(unsigned rating, const char *name, void __iomem *base,
-			unsigned long freq)
+			unsigned long phys, unsigned long freq)
 {
 	struct dw_apb_clocksource *dw_cs = kzalloc(sizeof(*dw_cs), GFP_KERNEL);
 
@@ -397,10 +398,22 @@ dw_apb_clocksource_init(unsigned rating, const char *name, void __iomem *base,
 	dw_cs->cs.mask = CLOCKSOURCE_MASK(32);
 	dw_cs->cs.flags = CLOCK_SOURCE_IS_CONTINUOUS;
 	dw_cs->cs.resume = apbt_restart_clocksource;
+	dw_cs->phys = phys;
 
 	return dw_cs;
 }
 
+#ifdef CONFIG_IPIPE
+static struct __ipipe_tscinfo apb_tsc_info = {
+	.type = IPIPE_TSC_TYPE_FREERUNNING_COUNTDOWN,
+	.u = {
+		.dec = {
+			.mask = 0xffffffffU,
+		},
+	},
+};
+#endif
+
 /**
  * dw_apb_clocksource_register() - register the APB clocksource.
  *
@@ -409,6 +422,12 @@ dw_apb_clocksource_init(unsigned rating, const char *name, void __iomem *base,
 void dw_apb_clocksource_register(struct dw_apb_clocksource *dw_cs)
 {
 	clocksource_register_hz(&dw_cs->cs, dw_cs->timer.freq);
+#ifdef CONFIG_IPIPE
+	apb_tsc_info.u.dec.counter = (void *)(dw_cs->phys + APBTMR_N_CURRENT_VALUE);
+	apb_tsc_info.counter_vaddr = (unsigned long)dw_cs->timer.base + APBTMR_N_CURRENT_VALUE;
+	apb_tsc_info.freq = dw_cs->timer.freq;
+	__ipipe_tsc_register(&apb_tsc_info);
+#endif
 }
 
 /**
diff --git a/drivers/clocksource/dw_apb_timer_of.c b/drivers/clocksource/dw_apb_timer_of.c
index 6921b91b61ef..e1d2dd5b83cf 100644
--- a/drivers/clocksource/dw_apb_timer_of.c
+++ b/drivers/clocksource/dw_apb_timer_of.c
@@ -15,17 +15,21 @@
 #include <linux/sched_clock.h>
 
 static void __init timer_get_base_and_rate(struct device_node *np,
-				    void __iomem **base, u32 *rate)
+					void __iomem **base, unsigned long *phys,
+					u32 *rate)
 {
 	struct clk *timer_clk;
+	struct resource res;
 	struct clk *pclk;
 	struct reset_control *rstc;
 
 	*base = of_iomap(np, 0);
 
-	if (!*base)
+	if (!*base || of_address_to_resource(np, 0, &res))
 		panic("Unable to map regs for %pOFn", np);
 
+	*phys = res.start;
+
 	/*
 	 * Reset the timer if the reset control is available, wiping
 	 * out the state the firmware may have left it
@@ -65,13 +69,14 @@ static void __init add_clockevent(struct device_node *event_timer)
 {
 	void __iomem *iobase;
 	struct dw_apb_clock_event_device *ced;
+	unsigned long phys;
 	u32 irq, rate;
 
 	irq = irq_of_parse_and_map(event_timer, 0);
 	if (irq == 0)
 		panic("No IRQ for clock event timer");
 
-	timer_get_base_and_rate(event_timer, &iobase, &rate);
+	timer_get_base_and_rate(event_timer, &iobase, &phys, &rate);
 
 	ced = dw_apb_clockevent_init(0, event_timer->name, 300, iobase, irq,
 				     rate);
@@ -88,11 +93,12 @@ static void __init add_clocksource(struct device_node *source_timer)
 {
 	void __iomem *iobase;
 	struct dw_apb_clocksource *cs;
+	unsigned long phys;
 	u32 rate;
 
-	timer_get_base_and_rate(source_timer, &iobase, &rate);
+	timer_get_base_and_rate(source_timer, &iobase, &phys, &rate);
 
-	cs = dw_apb_clocksource_init(300, source_timer->name, iobase, rate);
+	cs = dw_apb_clocksource_init(300, source_timer->name, iobase, phys, rate);
 	if (!cs)
 		panic("Unable to initialise clocksource device");
 
@@ -121,11 +127,12 @@ static const struct of_device_id sptimer_ids[] __initconst = {
 static void __init init_sched_clock(void)
 {
 	struct device_node *sched_timer;
+	unsigned long phys;
 
 	sched_timer = of_find_matching_node(NULL, sptimer_ids);
 	if (sched_timer) {
 		timer_get_base_and_rate(sched_timer, &sched_io_base,
-					&sched_rate);
+					&phys, &sched_rate);
 		of_node_put(sched_timer);
 	}
 
diff --git a/drivers/clocksource/timer-imx-gpt.c b/drivers/clocksource/timer-imx-gpt.c
index 706c0d0ff56c..1db8f5b8da93 100644
--- a/drivers/clocksource/timer-imx-gpt.c
+++ b/drivers/clocksource/timer-imx-gpt.c
@@ -16,6 +16,8 @@
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/of_irq.h>
+#include <linux/ipipe.h>
+#include <linux/ipipe_tickdev.h>
 #include <soc/imx/timer.h>
 
 /*
@@ -61,6 +63,9 @@
 
 struct imx_timer {
 	enum imx_gpt_type type;
+#ifdef CONFIG_IPIPE
+	unsigned long pbase;
+#endif
 	void __iomem *base;
 	int irq;
 	struct clk *clk_per;
@@ -252,6 +257,30 @@ static int mxc_set_oneshot(struct clock_event_device *ced)
 	return 0;
 }
 
+#ifdef CONFIG_IPIPE
+
+static struct imx_timer *global_imx_timer;
+
+static void mxc_timer_ack(void)
+{
+	global_imx_timer->gpt->gpt_irq_acknowledge(global_imx_timer);
+}
+
+static struct __ipipe_tscinfo tsc_info = {
+       .type = IPIPE_TSC_TYPE_FREERUNNING,
+       .u = {
+	       {
+		       .mask = 0xffffffff,
+	       },
+       },
+};
+
+static struct ipipe_timer mxc_itimer = {
+	.ack = mxc_timer_ack,
+};
+
+#endif
+
 /*
  * IRQ handler for the timer
  */
@@ -263,7 +292,8 @@ static irqreturn_t mxc_timer_interrupt(int irq, void *dev_id)
 
 	tstat = readl_relaxed(imxtm->base + imxtm->gpt->reg_tstat);
 
-	imxtm->gpt->gpt_irq_acknowledge(imxtm);
+	if (!clockevent_ipipe_stolen(ced))
+		imxtm->gpt->gpt_irq_acknowledge(imxtm);
 
 	ced->event_handler(ced);
 
@@ -284,6 +314,9 @@ static int __init mxc_clockevent_init(struct imx_timer *imxtm)
 	ced->rating = 200;
 	ced->cpumask = cpumask_of(0);
 	ced->irq = imxtm->irq;
+#ifdef CONFIG_IPIPE
+	ced->ipipe_timer = &mxc_itimer;
+#endif
 	clockevents_config_and_register(ced, clk_get_rate(imxtm->clk_per),
 					0xff, 0xfffffffe);
 
@@ -423,6 +456,17 @@ static int __init _mxc_timer_init(struct imx_timer *imxtm)
 	if (ret)
 		return ret;
 
+#ifdef CONFIG_IPIPE
+	tsc_info.u.counter_paddr = imxtm->pbase + imxtm->gpt->reg_tcn;
+	tsc_info.counter_vaddr = (unsigned long)imxtm->base + imxtm->gpt->reg_tcn;
+	tsc_info.freq = clk_get_rate(imxtm->clk_per);
+	__ipipe_tsc_register(&tsc_info);
+	mxc_itimer.irq = imxtm->irq;
+	mxc_itimer.freq = clk_get_rate(imxtm->clk_per);
+	mxc_itimer.min_delay_ticks = ipipe_timer_ns2ticks(&mxc_itimer, 2000);
+	global_imx_timer = imxtm;
+#endif /* CONFIG_IPIPE */
+
 	return mxc_clockevent_init(imxtm);
 }
 
@@ -438,6 +482,9 @@ void __init mxc_timer_init(unsigned long pbase, int irq, enum imx_gpt_type type)
 
 	imxtm->base = ioremap(pbase, SZ_4K);
 	BUG_ON(!imxtm->base);
+#ifdef CONFIG_IPIPE
+	imxtm->pbase = pbase;
+#endif
 
 	imxtm->type = type;
 	imxtm->irq = irq;
@@ -449,6 +496,7 @@ static int __init mxc_timer_init_dt(struct device_node *np,  enum imx_gpt_type t
 {
 	struct imx_timer *imxtm;
 	static int initialized;
+	struct resource res;
 	int ret;
 
 	/* Support one instance only */
@@ -467,6 +515,13 @@ static int __init mxc_timer_init_dt(struct device_node *np,  enum imx_gpt_type t
 	if (imxtm->irq <= 0)
 		return -EINVAL;
 
+	if (of_address_to_resource(np, 0, &res))
+	    res.start = 0;
+
+#ifdef CONFIG_IPIPE
+	imxtm->pbase = res.start;
+#endif
+
 	imxtm->clk_ipg = of_clk_get_by_name(np, "ipg");
 
 	/* Try osc_per first, and fall back to per otherwise */
diff --git a/drivers/clocksource/timer-sp804.c b/drivers/clocksource/timer-sp804.c
index 9c841980eed1..48a3c8df7971 100644
--- a/drivers/clocksource/timer-sp804.c
+++ b/drivers/clocksource/timer-sp804.c
@@ -17,11 +17,25 @@
 #include <linux/of_clk.h>
 #include <linux/of_irq.h>
 #include <linux/sched_clock.h>
+#include <linux/module.h>
+#include <linux/ipipe.h>
+#include <linux/ipipe_tickdev.h>
 
 #include <clocksource/timer-sp804.h>
 
 #include "timer-sp.h"
 
+#ifdef CONFIG_IPIPE
+static struct __ipipe_tscinfo tsc_info = {
+	.type = IPIPE_TSC_TYPE_FREERUNNING_COUNTDOWN,
+	.u = {
+		{
+			.mask = 0xffffffff,
+		},
+	},
+};
+#endif /* CONFIG_IPIPE */
+
 static long __init sp804_get_clock_rate(struct clk *clk)
 {
 	long rate;
@@ -66,6 +80,7 @@ void __init sp804_timer_disable(void __iomem *base)
 }
 
 int  __init __sp804_clocksource_and_sched_clock_init(void __iomem *base,
+						     unsigned long phys,
 						     const char *name,
 						     struct clk *clk,
 						     int use_sched_clock)
@@ -100,6 +115,12 @@ int  __init __sp804_clocksource_and_sched_clock_init(void __iomem *base,
 		sched_clock_register(sp804_read, 32, rate);
 	}
 
+#ifdef CONFIG_IPIPE
+	tsc_info.freq = rate;
+	tsc_info.counter_vaddr = (unsigned long)base + TIMER_VALUE;
+	tsc_info.u.counter_paddr = phys + TIMER_VALUE;
+	__ipipe_tsc_register(&tsc_info);
+#endif
 	return 0;
 }
 
@@ -214,6 +235,7 @@ static int __init sp804_of_init(struct device_node *np)
 	u32 irq_num = 0;
 	struct clk *clk1, *clk2;
 	const char *name = of_get_property(np, "compatible", NULL);
+	struct resource res;
 
 	base = of_iomap(np, 0);
 	if (!base)
@@ -247,6 +269,9 @@ static int __init sp804_of_init(struct device_node *np)
 	if (irq <= 0)
 		goto err;
 
+	if (of_address_to_resource(np, 0, &res))
+	    res.start = 0;
+
 	of_property_read_u32(np, "arm,sp804-has-irq", &irq_num);
 	if (irq_num == 2) {
 
@@ -254,7 +279,7 @@ static int __init sp804_of_init(struct device_node *np)
 		if (ret)
 			goto err;
 
-		ret = __sp804_clocksource_and_sched_clock_init(base, name, clk1, 1);
+		ret = __sp804_clocksource_and_sched_clock_init(base, res.start, name, clk1, 1);
 		if (ret)
 			goto err;
 	} else {
@@ -264,7 +289,7 @@ static int __init sp804_of_init(struct device_node *np)
 			goto err;
 
 		ret =__sp804_clocksource_and_sched_clock_init(base + TIMER_2_BASE,
-							      name, clk2, 1);
+							      res.start, name, clk2, 1);
 		if (ret)
 			goto err;
 	}
@@ -284,6 +309,7 @@ static int __init integrator_cp_of_init(struct device_node *np)
 	int irq, ret = -EINVAL;
 	const char *name = of_get_property(np, "compatible", NULL);
 	struct clk *clk;
+	struct resource res;
 
 	base = of_iomap(np, 0);
 	if (!base) {
@@ -303,8 +329,11 @@ static int __init integrator_cp_of_init(struct device_node *np)
 	if (init_count == 2 || !of_device_is_available(np))
 		goto err;
 
+	if (of_address_to_resource(np, 0, &res))
+	    res.start = 0;
+
 	if (!init_count) {
-		ret = __sp804_clocksource_and_sched_clock_init(base, name, clk, 0);
+		ret = __sp804_clocksource_and_sched_clock_init(base, res.start, name, clk, 0);
 		if (ret)
 			goto err;
 	} else {
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 73f08cda21e0..d771356ec79d 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -17,6 +17,7 @@
 #include <linux/pm_qos.h>
 #include <linux/cpu.h>
 #include <linux/cpuidle.h>
+#include <linux/ipipe.h>
 #include <linux/ktime.h>
 #include <linux/hrtimer.h>
 #include <linux/module.h>
@@ -205,6 +206,19 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	ktime_t time_start, time_end;
 
 	/*
+	 * A co-kernel running on the head stage of the IRQ pipeline
+	 * may deny switching to a deeper C-state. If so, call the
+	 * default idle routine instead. If the co-kernel cannot bear
+	 * with the latency induced by the default idling operation,
+	 * then CPUIDLE is not usable and should be disabled at build
+	 * time.
+	 */
+	if (!ipipe_enter_cpuidle(dev, target_state)) {
+		default_idle_call();
+		return -EBUSY;
+	}
+
+	/*
 	 * Tell the time framework to switch to a broadcast timer because our
 	 * local timer will be shut down.  If a local timer is used from another
 	 * CPU as a broadcast timer, this call may fail if it is not available.
@@ -228,6 +242,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 
 	stop_critical_timings();
 	entered_state = target_state->enter(dev, drv, index);
+	hard_cond_local_irq_enable();
 	start_critical_timings();
 
 	sched_clock_idle_wakeup_event();
diff --git a/drivers/gpio/gpio-davinci.c b/drivers/gpio/gpio-davinci.c
index e0b025689625..358254edbde6 100644
--- a/drivers/gpio/gpio-davinci.c
+++ b/drivers/gpio/gpio-davinci.c
@@ -22,7 +22,7 @@
 #include <linux/platform_data/gpio-davinci.h>
 #include <linux/irqchip/chained_irq.h>
 #include <linux/spinlock.h>
-
+#include <linux/ipipe.h>
 #include <asm-generic/gpio.h>
 
 #define MAX_REGS_BANKS 5
@@ -333,7 +333,7 @@ static struct irq_chip gpio_irqchip = {
 	.irq_enable	= gpio_irq_enable,
 	.irq_disable	= gpio_irq_disable,
 	.irq_set_type	= gpio_irq_type,
-	.flags		= IRQCHIP_SET_TYPE_MASKED,
+	.flags		= IRQCHIP_SET_TYPE_MASKED | IRQCHIP_PIPELINE_SAFE,
 };
 
 static void gpio_irq_handler(struct irq_desc *desc)
@@ -376,7 +376,7 @@ static void gpio_irq_handler(struct irq_desc *desc)
 			 */
 			hw_irq = (bank_num / 2) * 32 + bit;
 
-			generic_handle_irq(
+			ipipe_handle_demuxed_irq(
 				irq_find_mapping(d->irq_domain, hw_irq));
 		}
 	}
diff --git a/drivers/gpio/gpio-mvebu.c b/drivers/gpio/gpio-mvebu.c
index 6c0687694341..1b30950c90d2 100644
--- a/drivers/gpio/gpio-mvebu.c
+++ b/drivers/gpio/gpio-mvebu.c
@@ -52,6 +52,7 @@
 #include <linux/pwm.h>
 #include <linux/regmap.h>
 #include <linux/slab.h>
+#include <linux/ipipe.h>
 
 /*
  * GPIO unit register offsets.
@@ -402,10 +403,11 @@ static void mvebu_gpio_irq_ack(struct irq_data *d)
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
 	struct mvebu_gpio_chip *mvchip = gc->private;
 	u32 mask = d->mask;
+	unsigned long flags;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	mvebu_gpio_write_edge_cause(mvchip, ~mask);
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 
 static void mvebu_gpio_edge_irq_mask(struct irq_data *d)
@@ -414,11 +416,12 @@ static void mvebu_gpio_edge_irq_mask(struct irq_data *d)
 	struct mvebu_gpio_chip *mvchip = gc->private;
 	struct irq_chip_type *ct = irq_data_get_chip_type(d);
 	u32 mask = d->mask;
+	unsigned long flags;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	ct->mask_cache_priv &= ~mask;
 	mvebu_gpio_write_edge_mask(mvchip, ct->mask_cache_priv);
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 
 static void mvebu_gpio_edge_irq_unmask(struct irq_data *d)
@@ -427,11 +430,12 @@ static void mvebu_gpio_edge_irq_unmask(struct irq_data *d)
 	struct mvebu_gpio_chip *mvchip = gc->private;
 	struct irq_chip_type *ct = irq_data_get_chip_type(d);
 	u32 mask = d->mask;
+	unsigned long flags;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	ct->mask_cache_priv |= mask;
 	mvebu_gpio_write_edge_mask(mvchip, ct->mask_cache_priv);
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 
 static void mvebu_gpio_level_irq_mask(struct irq_data *d)
@@ -440,11 +444,12 @@ static void mvebu_gpio_level_irq_mask(struct irq_data *d)
 	struct mvebu_gpio_chip *mvchip = gc->private;
 	struct irq_chip_type *ct = irq_data_get_chip_type(d);
 	u32 mask = d->mask;
+	unsigned long flags;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	ct->mask_cache_priv &= ~mask;
 	mvebu_gpio_write_level_mask(mvchip, ct->mask_cache_priv);
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 
 static void mvebu_gpio_level_irq_unmask(struct irq_data *d)
@@ -453,11 +458,12 @@ static void mvebu_gpio_level_irq_unmask(struct irq_data *d)
 	struct mvebu_gpio_chip *mvchip = gc->private;
 	struct irq_chip_type *ct = irq_data_get_chip_type(d);
 	u32 mask = d->mask;
+	unsigned long flags;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	ct->mask_cache_priv |= mask;
 	mvebu_gpio_write_level_mask(mvchip, ct->mask_cache_priv);
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 
 /*****************************************************************************
@@ -591,7 +597,7 @@ static void mvebu_gpio_irq_handler(struct irq_desc *desc)
 				     polarity);
 		}
 
-		generic_handle_irq(irq);
+		ipipe_handle_demuxed_irq(irq);
 	}
 
 	chained_irq_exit(chip, desc);
@@ -1229,6 +1235,7 @@ static int mvebu_gpio_probe(struct platform_device *pdev)
 	ct->chip.irq_unmask = mvebu_gpio_level_irq_unmask;
 	ct->chip.irq_set_type = mvebu_gpio_irq_set_type;
 	ct->chip.name = mvchip->chip.label;
+	ct->chip.flags = IRQCHIP_PIPELINE_SAFE;
 
 	ct = &gc->chip_types[1];
 	ct->type = IRQ_TYPE_EDGE_RISING | IRQ_TYPE_EDGE_FALLING;
@@ -1238,6 +1245,7 @@ static int mvebu_gpio_probe(struct platform_device *pdev)
 	ct->chip.irq_set_type = mvebu_gpio_irq_set_type;
 	ct->handler = handle_edge_irq;
 	ct->chip.name = mvchip->chip.label;
+	ct->chip.flags = IRQCHIP_PIPELINE_SAFE;
 
 	/*
 	 * Setup the interrupt handlers. Each chip can have up to 4
diff --git a/drivers/gpio/gpio-mxc.c b/drivers/gpio/gpio-mxc.c
index c77d474185f3..86c7165f6660 100644
--- a/drivers/gpio/gpio-mxc.c
+++ b/drivers/gpio/gpio-mxc.c
@@ -22,6 +22,7 @@
 #include <linux/of.h>
 #include <linux/of_device.h>
 #include <linux/bug.h>
+#include <linux/ipipe.h>
 
 enum mxc_gpio_hwtype {
 	IMX1_GPIO,	/* runs on i.mx1 */
@@ -266,7 +267,7 @@ static void mxc_gpio_irq_handler(struct mxc_gpio_port *port, u32 irq_stat)
 		if (port->both_edges & (1 << irqoffset))
 			mxc_flip_edge(port, irqoffset);
 
-		generic_handle_irq(irq_find_mapping(port->domain, irqoffset));
+		ipipe_handle_demuxed_irq(irq_find_mapping(port->domain, irqoffset));
 
 		irq_stat &= ~(1 << irqoffset);
 	}
@@ -359,7 +360,7 @@ static int mxc_gpio_init_gc(struct mxc_gpio_port *port, int irq_base)
 	ct->chip.irq_unmask = irq_gc_mask_set_bit;
 	ct->chip.irq_set_type = gpio_set_irq_type;
 	ct->chip.irq_set_wake = gpio_set_wake_irq;
-	ct->chip.flags = IRQCHIP_MASK_ON_SUSPEND;
+	ct->chip.flags = IRQCHIP_MASK_ON_SUSPEND | IRQCHIP_PIPELINE_SAFE;
 	ct->regs.ack = GPIO_ISR;
 	ct->regs.mask = GPIO_IMR;
 
diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index d0f27084a942..2c8d54fa3e26 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -24,6 +24,7 @@
 #include <linux/of_device.h>
 #include <linux/gpio/driver.h>
 #include <linux/bitops.h>
+#include <linux/ipipe.h>
 #include <linux/platform_data/gpio-omap.h>
 
 #define OMAP4_GPIO_DEBOUNCINGTIME_MASK 0xFF
@@ -54,7 +55,11 @@ struct gpio_bank {
 	u32 saved_datain;
 	u32 level_mask;
 	u32 toggle_mask;
+#ifdef CONFIG_IPIPE
+	ipipe_spinlock_t lock;
+#else
 	raw_spinlock_t lock;
+#endif
 	raw_spinlock_t wa_lock;
 	struct gpio_chip chip;
 	struct clk *dbck;
@@ -553,18 +558,18 @@ static int omap_gpio_wake_enable(struct irq_data *d, unsigned int enable)
  * line's interrupt handler has been run, we may miss some nested
  * interrupts.
  */
-static irqreturn_t omap_gpio_irq_handler(int irq, void *gpiobank)
+static void __omap_gpio_irq_handler(struct gpio_bank *bank)
 {
 	void __iomem *isr_reg = NULL;
 	u32 enabled, isr, edge;
 	unsigned int bit;
-	struct gpio_bank *bank = gpiobank;
 	unsigned long wa_lock_flags;
 	unsigned long lock_flags;
 
 	isr_reg = bank->base + bank->regs->irqstatus;
 	if (WARN_ON(!isr_reg))
-		goto exit;
+		return;
+
 
 	if (WARN_ONCE(!pm_runtime_active(bank->chip.parent),
 		      "gpio irq%i while runtime suspended?\n", irq))
@@ -609,17 +614,38 @@ static irqreturn_t omap_gpio_irq_handler(int irq, void *gpiobank)
 
 			raw_spin_lock_irqsave(&bank->wa_lock, wa_lock_flags);
 
-			generic_handle_irq(irq_find_mapping(bank->chip.irq.domain,
-							    bit));
+			ipipe_handle_demuxed_irq(irq_find_mapping(bank->chip.irq.domain,
+						 bit));
 
 			raw_spin_unlock_irqrestore(&bank->wa_lock,
 						   wa_lock_flags);
 		}
 	}
-exit:
+}
+
+#ifdef CONFIG_IPIPE
+
+static void omap_gpio_irq_handler(struct irq_desc *d)
+{
+	struct gpio_bank *bank = irq_desc_get_handler_data(d);
+	__omap_gpio_irq_handler(bank);
+}
+
+#else
+
+static irqreturn_t omap_gpio_irq_handler(int irq, void *gpiobank)
+{
+	struct gpio_bank *bank = gpiobank;
+
+	pm_runtime_get_sync(bank->chip.parent);
+	__omap_gpio_irq_handler(bank);
+	pm_runtime_put(bank->chip.parent);
+
 	return IRQ_HANDLED;
 }
 
+#endif
+
 static unsigned int omap_gpio_irq_startup(struct irq_data *d)
 {
 	struct gpio_bank *bank = omap_irq_data_get_bank(d);
@@ -682,6 +708,19 @@ static void omap_gpio_mask_irq(struct irq_data *d)
 	raw_spin_unlock_irqrestore(&bank->lock, flags);
 }
 
+static void omap_gpio_mask_ack_irq(struct irq_data *d)
+{
+	struct gpio_bank *bank = omap_irq_data_get_bank(d);
+	unsigned offset = d->hwirq;
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&bank->lock, flags);
+	omap_set_gpio_irqenable(bank, offset, 0);
+	omap_set_gpio_triggering(bank, offset, IRQ_TYPE_NONE);
+	omap_clear_gpio_irqstatus(bank, offset);
+	raw_spin_unlock_irqrestore(&bank->lock, flags);
+}
+
 static void omap_gpio_unmask_irq(struct irq_data *d)
 {
 	struct gpio_bank *bank = omap_irq_data_get_bank(d);
@@ -1041,11 +1080,16 @@ static int omap_gpio_chip_init(struct gpio_bank *bank, struct irq_chip *irqc)
 		return ret;
 	}
 
+#ifdef CONFIG_IPIPE
+	irq_set_chained_handler_and_data(bank->irq,
+					 omap_gpio_irq_handler, bank);
+#else
 	ret = devm_request_irq(bank->chip.parent, bank->irq,
 			       omap_gpio_irq_handler,
 			       0, dev_name(bank->chip.parent), bank);
 	if (ret)
 		gpiochip_remove(&bank->chip);
+#endif
 
 	if (!bank->is_mpuio)
 		gpio += bank->width;
@@ -1368,13 +1412,14 @@ static int omap_gpio_probe(struct platform_device *pdev)
 	irqc->irq_shutdown = omap_gpio_irq_shutdown,
 	irqc->irq_ack = dummy_irq_chip.irq_ack,
 	irqc->irq_mask = omap_gpio_mask_irq,
+	irqc->irq_mask_ack = omap_gpio_mask_ack_irq,
 	irqc->irq_unmask = omap_gpio_unmask_irq,
 	irqc->irq_set_type = omap_gpio_irq_type,
 	irqc->irq_set_wake = omap_gpio_wake_enable,
 	irqc->irq_bus_lock = omap_gpio_irq_bus_lock,
 	irqc->irq_bus_sync_unlock = gpio_irq_bus_sync_unlock,
 	irqc->name = dev_name(&pdev->dev);
-	irqc->flags = IRQCHIP_MASK_ON_SUSPEND;
+	irqc->flags = IRQCHIP_MASK_ON_SUSPEND | IRQCHIP_PIPELINE_SAFE;
 	irqc->parent_device = dev;
 
 	bank->irq = platform_get_irq(pdev, 0);
diff --git a/drivers/gpio/gpio-pl061.c b/drivers/gpio/gpio-pl061.c
index 722ce5cf861e..6d3150312f95 100644
--- a/drivers/gpio/gpio-pl061.c
+++ b/drivers/gpio/gpio-pl061.c
@@ -23,6 +23,7 @@
 #include <linux/slab.h>
 #include <linux/pinctrl/consumer.h>
 #include <linux/pm.h>
+#include <linux/ipipe.h>
 
 #define GPIODIR 0x400
 #define GPIOIS  0x404
@@ -47,7 +48,11 @@ struct pl061_context_save_regs {
 #endif
 
 struct pl061 {
+#ifdef CONFIG_IPIPE
+	ipipe_spinlock_t	lock;
+#else
 	raw_spinlock_t		lock;
+#endif
 
 	void __iomem		*base;
 	struct gpio_chip	gc;
@@ -219,8 +224,8 @@ static void pl061_irq_handler(struct irq_desc *desc)
 	pending = readb(pl061->base + GPIOMIS);
 	if (pending) {
 		for_each_set_bit(offset, &pending, PL061_GPIO_NR)
-			generic_handle_irq(irq_find_mapping(gc->irq.domain,
-							    offset));
+			ipipe_handle_demuxed_irq(irq_find_mapping(gc->irq.domain,
+								  offset));
 	}
 
 	chained_irq_exit(irqchip, desc);
@@ -231,6 +236,22 @@ static void pl061_irq_mask(struct irq_data *d)
 	struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
 	struct pl061 *pl061 = gpiochip_get_data(gc);
 	u8 mask = BIT(irqd_to_hwirq(d) % PL061_GPIO_NR);
+	unsigned long flags;
+	u8 gpioie;
+
+	raw_spin_lock_irqsave(&pl061->lock, flags);
+	gpioie = readb(pl061->base + GPIOIE) & ~mask;
+	writeb(gpioie, pl061->base + GPIOIE);
+	ipipe_lock_irq(d->irq);
+	raw_spin_unlock_irqrestore(&pl061->lock, flags);
+}
+
+#ifdef CONFIG_IPIPE
+static void pl061_irq_mask_ack(struct irq_data *d)
+{
+	struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+	struct pl061 *pl061 = gpiochip_get_data(gc);
+	u8 mask = BIT(irqd_to_hwirq(d) % PL061_GPIO_NR);
 	u8 gpioie;
 
 	raw_spin_lock(&pl061->lock);
@@ -238,6 +259,7 @@ static void pl061_irq_mask(struct irq_data *d)
 	writeb(gpioie, pl061->base + GPIOIE);
 	raw_spin_unlock(&pl061->lock);
 }
+#endif
 
 static void pl061_irq_unmask(struct irq_data *d)
 {
@@ -320,6 +342,10 @@ static int pl061_probe(struct amba_device *adev, const struct amba_id *id)
 	pl061->irq_chip.irq_unmask = pl061_irq_unmask;
 	pl061->irq_chip.irq_set_type = pl061_irq_type;
 	pl061->irq_chip.irq_set_wake = pl061_irq_set_wake;
+#ifdef CONFIG_IPIPE
+	pl061->irq_chip.irq_mask_ack = pl061_irq_mask_ack;
+	pl061->irq_chip.flags = IRQCHIP_PIPELINE_SAFE;
+#endif
 
 	writeb(0, pl061->base + GPIOIE); /* disable irqs */
 	irq = adev->irq[0];
diff --git a/drivers/gpio/gpio-zynq.c b/drivers/gpio/gpio-zynq.c
index 7835aad6d162..f3bba132bf3d 100644
--- a/drivers/gpio/gpio-zynq.c
+++ b/drivers/gpio/gpio-zynq.c
@@ -10,6 +10,7 @@
 #include <linux/gpio/driver.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
+#include <linux/ipipe.h>
 #include <linux/io.h>
 #include <linux/module.h>
 #include <linux/platform_device.h>
@@ -126,6 +127,8 @@ struct zynq_gpio {
 	struct gpio_regs context;
 };
 
+static IPIPE_DEFINE_RAW_SPINLOCK(zynq_gpio_lock);
+
 /**
  * struct zynq_platform_data -  zynq gpio platform data structure
  * @label:	string to store in gpio->label
@@ -298,6 +301,7 @@ static int zynq_gpio_dir_in(struct gpio_chip *chip, unsigned int pin)
 	u32 reg;
 	unsigned int bank_num, bank_pin_num;
 	struct zynq_gpio *gpio = gpiochip_get_data(chip);
+	unsigned long flags;
 
 	zynq_gpio_get_bank_pin(pin, &bank_num, &bank_pin_num, gpio);
 
@@ -309,10 +313,12 @@ static int zynq_gpio_dir_in(struct gpio_chip *chip, unsigned int pin)
 	    (bank_pin_num == 7 || bank_pin_num == 8))
 		return -EINVAL;
 
+	raw_spin_lock_irqsave(&zynq_gpio_lock, flags);
 	/* clear the bit in direction mode reg to set the pin as input */
 	reg = readl_relaxed(gpio->base_addr + ZYNQ_GPIO_DIRM_OFFSET(bank_num));
 	reg &= ~BIT(bank_pin_num);
 	writel_relaxed(reg, gpio->base_addr + ZYNQ_GPIO_DIRM_OFFSET(bank_num));
+	raw_spin_unlock_irqrestore(&zynq_gpio_lock, flags);
 
 	return 0;
 }
@@ -335,9 +341,11 @@ static int zynq_gpio_dir_out(struct gpio_chip *chip, unsigned int pin,
 	u32 reg;
 	unsigned int bank_num, bank_pin_num;
 	struct zynq_gpio *gpio = gpiochip_get_data(chip);
+	unsigned long flags;
 
 	zynq_gpio_get_bank_pin(pin, &bank_num, &bank_pin_num, gpio);
 
+	raw_spin_lock_irqsave(&zynq_gpio_lock, flags);
 	/* set the GPIO pin as output */
 	reg = readl_relaxed(gpio->base_addr + ZYNQ_GPIO_DIRM_OFFSET(bank_num));
 	reg |= BIT(bank_pin_num);
@@ -347,6 +355,7 @@ static int zynq_gpio_dir_out(struct gpio_chip *chip, unsigned int pin,
 	reg = readl_relaxed(gpio->base_addr + ZYNQ_GPIO_OUTEN_OFFSET(bank_num));
 	reg |= BIT(bank_pin_num);
 	writel_relaxed(reg, gpio->base_addr + ZYNQ_GPIO_OUTEN_OFFSET(bank_num));
+	raw_spin_unlock_irqrestore(&zynq_gpio_lock, flags);
 
 	/* set the state of the pin */
 	zynq_gpio_set_value(chip, pin, state);
@@ -388,11 +397,15 @@ static void zynq_gpio_irq_mask(struct irq_data *irq_data)
 	unsigned int device_pin_num, bank_num, bank_pin_num;
 	struct zynq_gpio *gpio =
 		gpiochip_get_data(irq_data_get_irq_chip_data(irq_data));
+	unsigned long flags;
 
 	device_pin_num = irq_data->hwirq;
 	zynq_gpio_get_bank_pin(device_pin_num, &bank_num, &bank_pin_num, gpio);
+	raw_spin_lock_irqsave(&zynq_gpio_lock, flags);
+	ipipe_lock_irq(irq_data->irq);
 	writel_relaxed(BIT(bank_pin_num),
 		       gpio->base_addr + ZYNQ_GPIO_INTDIS_OFFSET(bank_num));
+	raw_spin_unlock_irqrestore(&zynq_gpio_lock, flags);
 }
 
 /**
@@ -409,11 +422,15 @@ static void zynq_gpio_irq_unmask(struct irq_data *irq_data)
 	unsigned int device_pin_num, bank_num, bank_pin_num;
 	struct zynq_gpio *gpio =
 		gpiochip_get_data(irq_data_get_irq_chip_data(irq_data));
+	unsigned long flags;
 
 	device_pin_num = irq_data->hwirq;
 	zynq_gpio_get_bank_pin(device_pin_num, &bank_num, &bank_pin_num, gpio);
+	raw_spin_lock_irqsave(&zynq_gpio_lock, flags);
 	writel_relaxed(BIT(bank_pin_num),
 		       gpio->base_addr + ZYNQ_GPIO_INTEN_OFFSET(bank_num));
+	ipipe_unlock_irq(irq_data->irq);
+	raw_spin_unlock_irqrestore(&zynq_gpio_lock, flags);
 }
 
 /**
@@ -571,11 +588,47 @@ static void zynq_gpio_irq_relres(struct irq_data *d)
 	pm_runtime_put(chip->parent);
 }
 
+#ifdef CONFIG_IPIPE
+
+static void zynq_gpio_hold_irq(struct irq_data *irq_data)
+{
+	unsigned int device_pin_num, bank_num, bank_pin_num;
+	struct zynq_gpio *gpio =
+		gpiochip_get_data(irq_data_get_irq_chip_data(irq_data));
+
+	device_pin_num = irq_data->hwirq;
+	zynq_gpio_get_bank_pin(device_pin_num, &bank_num, &bank_pin_num, gpio);
+	raw_spin_lock(&zynq_gpio_lock);
+	writel_relaxed(BIT(bank_pin_num),
+			gpio->base_addr + ZYNQ_GPIO_INTDIS_OFFSET(bank_num));
+	writel_relaxed(BIT(bank_pin_num),
+			gpio->base_addr + ZYNQ_GPIO_INTSTS_OFFSET(bank_num));
+	raw_spin_unlock(&zynq_gpio_lock);
+}
+
+static void zynq_gpio_release_irq(struct irq_data *irq_data)
+{
+	unsigned int device_pin_num, bank_num, bank_pin_num;
+	struct zynq_gpio *gpio =
+		gpiochip_get_data(irq_data_get_irq_chip_data(irq_data));
+
+device_pin_num = irq_data->hwirq;
+	zynq_gpio_get_bank_pin(device_pin_num, &bank_num, &bank_pin_num, gpio);
+	writel_relaxed(BIT(bank_pin_num),
+			gpio->base_addr + ZYNQ_GPIO_INTEN_OFFSET(bank_num));
+}
+
+#endif /* CONFIG_IPIPE */
+
 /* irq chip descriptor */
 static struct irq_chip zynq_gpio_level_irqchip = {
-	.name		= DRIVER_NAME,
+	.name		= DRIVER_NAME "-level",
 	.irq_enable	= zynq_gpio_irq_enable,
 	.irq_eoi	= zynq_gpio_irq_ack,
+#ifdef CONFIG_IPIPE
+	.irq_hold		= zynq_gpio_hold_irq,
+	.irq_release	= zynq_gpio_release_irq,
+#endif
 	.irq_mask	= zynq_gpio_irq_mask,
 	.irq_unmask	= zynq_gpio_irq_unmask,
 	.irq_set_type	= zynq_gpio_set_irq_type,
@@ -583,20 +636,24 @@ static struct irq_chip zynq_gpio_level_irqchip = {
 	.irq_request_resources = zynq_gpio_irq_reqres,
 	.irq_release_resources = zynq_gpio_irq_relres,
 	.flags		= IRQCHIP_EOI_THREADED | IRQCHIP_EOI_IF_HANDLED |
-			  IRQCHIP_MASK_ON_SUSPEND,
+			IRQCHIP_MASK_ON_SUSPEND | IRQCHIP_PIPELINE_SAFE,
 };
 
 static struct irq_chip zynq_gpio_edge_irqchip = {
-	.name		= DRIVER_NAME,
+	.name		= DRIVER_NAME "-edge",
 	.irq_enable	= zynq_gpio_irq_enable,
+#ifdef CONFIG_IPIPE
+	.irq_mask_ack	= zynq_gpio_hold_irq,
+#else
 	.irq_ack	= zynq_gpio_irq_ack,
+#endif
 	.irq_mask	= zynq_gpio_irq_mask,
 	.irq_unmask	= zynq_gpio_irq_unmask,
 	.irq_set_type	= zynq_gpio_set_irq_type,
 	.irq_set_wake	= zynq_gpio_set_wake,
 	.irq_request_resources = zynq_gpio_irq_reqres,
 	.irq_release_resources = zynq_gpio_irq_relres,
-	.flags		= IRQCHIP_MASK_ON_SUSPEND,
+	.flags		= IRQCHIP_MASK_ON_SUSPEND | IRQCHIP_PIPELINE_SAFE,
 };
 
 static void zynq_gpio_handle_bank_irq(struct zynq_gpio *gpio,
@@ -614,7 +671,7 @@ static void zynq_gpio_handle_bank_irq(struct zynq_gpio *gpio,
 		unsigned int gpio_irq;
 
 		gpio_irq = irq_find_mapping(irqdomain, offset + bank_offset);
-		generic_handle_irq(gpio_irq);
+		ipipe_handle_demuxed_irq(gpio_irq);
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index eee9fcbe0434..fea180869ae0 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -2168,7 +2168,7 @@ static void ring_destroy(struct intel_engine_cs *engine)
 	kfree(engine);
 }
 
-static void setup_irq(struct intel_engine_cs *engine)
+static void gt_setup_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
 
@@ -2194,7 +2194,7 @@ static void setup_common(struct intel_engine_cs *engine)
 	/* gen8+ are only supported with execlists */
 	GEM_BUG_ON(INTEL_GEN(i915) >= 8);
 
-	setup_irq(engine);
+	gt_setup_irq(engine);
 
 	engine->destroy = ring_destroy;
 
diff --git a/drivers/gpu/ipu-v3/ipu-common.c b/drivers/gpu/ipu-v3/ipu-common.c
index b3dae9ec1a38..e7c336386808 100644
--- a/drivers/gpu/ipu-v3/ipu-common.c
+++ b/drivers/gpu/ipu-v3/ipu-common.c
@@ -1081,7 +1081,7 @@ static void ipu_irq_handle(struct ipu_soc *ipu, const int *regs, int num_regs)
 			irq = irq_linear_revmap(ipu->domain,
 						regs[i] * 32 + bit);
 			if (irq)
-				generic_handle_irq(irq);
+				ipipe_handle_demuxed_irq(irq);
 		}
 	}
 }
@@ -1305,6 +1305,7 @@ static int ipu_irq_init(struct ipu_soc *ipu)
 		ct->chip.irq_ack = irq_gc_ack_set_bit;
 		ct->chip.irq_mask = irq_gc_mask_clr_bit;
 		ct->chip.irq_unmask = irq_gc_mask_set_bit;
+		ct->chip.flags = IRQCHIP_PIPELINE_SAFE;
 		ct->regs.ack = IPU_INT_STAT(i / 32);
 		ct->regs.mask = IPU_INT_CTRL(i / 32);
 	}
diff --git a/drivers/gpu/ipu-v3/ipu-prv.h b/drivers/gpu/ipu-v3/ipu-prv.h
index 291ac1bab66d..95edf23b95ef 100644
--- a/drivers/gpu/ipu-v3/ipu-prv.h
+++ b/drivers/gpu/ipu-v3/ipu-prv.h
@@ -170,7 +170,7 @@ struct ipu_soc {
 	struct device		*dev;
 	const struct ipu_devtype	*devtype;
 	enum ipuv3_type		ipu_type;
-	spinlock_t		lock;
+	ipipe_spinlock_t	lock;
 	struct mutex		channel_lock;
 	struct list_head	channels;
 
diff --git a/drivers/irqchip/irq-atmel-aic.c b/drivers/irqchip/irq-atmel-aic.c
deleted file mode 100644
index bb1ad451392f..000000000000
--- a/drivers/irqchip/irq-atmel-aic.c
+++ /dev/null
@@ -1,274 +0,0 @@
-/*
- * Atmel AT91 AIC (Advanced Interrupt Controller) driver
- *
- *  Copyright (C) 2004 SAN People
- *  Copyright (C) 2004 ATMEL
- *  Copyright (C) Rick Bronson
- *  Copyright (C) 2014 Free Electrons
- *
- *  Author: Boris BREZILLON <boris.brezillon@free-electrons.com>
- *
- * This file is licensed under the terms of the GNU General Public
- * License version 2.  This program is licensed "as is" without any
- * warranty of any kind, whether express or implied.
- */
-
-#include <linux/init.h>
-#include <linux/module.h>
-#include <linux/mm.h>
-#include <linux/bitmap.h>
-#include <linux/types.h>
-#include <linux/irq.h>
-#include <linux/irqchip.h>
-#include <linux/of.h>
-#include <linux/of_address.h>
-#include <linux/of_irq.h>
-#include <linux/irqdomain.h>
-#include <linux/err.h>
-#include <linux/slab.h>
-#include <linux/io.h>
-
-#include <asm/exception.h>
-#include <asm/mach/irq.h>
-
-#include "irq-atmel-aic-common.h"
-
-/* Number of irq lines managed by AIC */
-#define NR_AIC_IRQS	32
-
-#define AT91_AIC_SMR(n)			((n) * 4)
-
-#define AT91_AIC_SVR(n)			(0x80 + ((n) * 4))
-#define AT91_AIC_IVR			0x100
-#define AT91_AIC_FVR			0x104
-#define AT91_AIC_ISR			0x108
-
-#define AT91_AIC_IPR			0x10c
-#define AT91_AIC_IMR			0x110
-#define AT91_AIC_CISR			0x114
-
-#define AT91_AIC_IECR			0x120
-#define AT91_AIC_IDCR			0x124
-#define AT91_AIC_ICCR			0x128
-#define AT91_AIC_ISCR			0x12c
-#define AT91_AIC_EOICR			0x130
-#define AT91_AIC_SPU			0x134
-#define AT91_AIC_DCR			0x138
-
-static struct irq_domain *aic_domain;
-
-static asmlinkage void __exception_irq_entry
-aic_handle(struct pt_regs *regs)
-{
-	struct irq_domain_chip_generic *dgc = aic_domain->gc;
-	struct irq_chip_generic *gc = dgc->gc[0];
-	u32 irqnr;
-	u32 irqstat;
-
-	irqnr = irq_reg_readl(gc, AT91_AIC_IVR);
-	irqstat = irq_reg_readl(gc, AT91_AIC_ISR);
-
-	if (!irqstat)
-		irq_reg_writel(gc, 0, AT91_AIC_EOICR);
-	else
-		handle_domain_irq(aic_domain, irqnr, regs);
-}
-
-static int aic_retrigger(struct irq_data *d)
-{
-	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
-
-	/* Enable interrupt on AIC5 */
-	irq_gc_lock(gc);
-	irq_reg_writel(gc, d->mask, AT91_AIC_ISCR);
-	irq_gc_unlock(gc);
-
-	return 0;
-}
-
-static int aic_set_type(struct irq_data *d, unsigned type)
-{
-	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
-	unsigned int smr;
-	int ret;
-
-	smr = irq_reg_readl(gc, AT91_AIC_SMR(d->hwirq));
-	ret = aic_common_set_type(d, type, &smr);
-	if (ret)
-		return ret;
-
-	irq_reg_writel(gc, smr, AT91_AIC_SMR(d->hwirq));
-
-	return 0;
-}
-
-#ifdef CONFIG_PM
-static void aic_suspend(struct irq_data *d)
-{
-	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
-
-	irq_gc_lock(gc);
-	irq_reg_writel(gc, gc->mask_cache, AT91_AIC_IDCR);
-	irq_reg_writel(gc, gc->wake_active, AT91_AIC_IECR);
-	irq_gc_unlock(gc);
-}
-
-static void aic_resume(struct irq_data *d)
-{
-	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
-
-	irq_gc_lock(gc);
-	irq_reg_writel(gc, gc->wake_active, AT91_AIC_IDCR);
-	irq_reg_writel(gc, gc->mask_cache, AT91_AIC_IECR);
-	irq_gc_unlock(gc);
-}
-
-static void aic_pm_shutdown(struct irq_data *d)
-{
-	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
-
-	irq_gc_lock(gc);
-	irq_reg_writel(gc, 0xffffffff, AT91_AIC_IDCR);
-	irq_reg_writel(gc, 0xffffffff, AT91_AIC_ICCR);
-	irq_gc_unlock(gc);
-}
-#else
-#define aic_suspend		NULL
-#define aic_resume		NULL
-#define aic_pm_shutdown		NULL
-#endif /* CONFIG_PM */
-
-static void __init aic_hw_init(struct irq_domain *domain)
-{
-	struct irq_chip_generic *gc = irq_get_domain_generic_chip(domain, 0);
-	int i;
-
-	/*
-	 * Perform 8 End Of Interrupt Command to make sure AIC
-	 * will not Lock out nIRQ
-	 */
-	for (i = 0; i < 8; i++)
-		irq_reg_writel(gc, 0, AT91_AIC_EOICR);
-
-	/*
-	 * Spurious Interrupt ID in Spurious Vector Register.
-	 * When there is no current interrupt, the IRQ Vector Register
-	 * reads the value stored in AIC_SPU
-	 */
-	irq_reg_writel(gc, 0xffffffff, AT91_AIC_SPU);
-
-	/* No debugging in AIC: Debug (Protect) Control Register */
-	irq_reg_writel(gc, 0, AT91_AIC_DCR);
-
-	/* Disable and clear all interrupts initially */
-	irq_reg_writel(gc, 0xffffffff, AT91_AIC_IDCR);
-	irq_reg_writel(gc, 0xffffffff, AT91_AIC_ICCR);
-
-	for (i = 0; i < 32; i++)
-		irq_reg_writel(gc, i, AT91_AIC_SVR(i));
-}
-
-static int aic_irq_domain_xlate(struct irq_domain *d,
-				struct device_node *ctrlr,
-				const u32 *intspec, unsigned int intsize,
-				irq_hw_number_t *out_hwirq,
-				unsigned int *out_type)
-{
-	struct irq_domain_chip_generic *dgc = d->gc;
-	struct irq_chip_generic *gc;
-	unsigned long flags;
-	unsigned smr;
-	int idx;
-	int ret;
-
-	if (!dgc)
-		return -EINVAL;
-
-	ret = aic_common_irq_domain_xlate(d, ctrlr, intspec, intsize,
-					  out_hwirq, out_type);
-	if (ret)
-		return ret;
-
-	idx = intspec[0] / dgc->irqs_per_chip;
-	if (idx >= dgc->num_chips)
-		return -EINVAL;
-
-	gc = dgc->gc[idx];
-
-	irq_gc_lock_irqsave(gc, flags);
-	smr = irq_reg_readl(gc, AT91_AIC_SMR(*out_hwirq));
-	aic_common_set_priority(intspec[2], &smr);
-	irq_reg_writel(gc, smr, AT91_AIC_SMR(*out_hwirq));
-	irq_gc_unlock_irqrestore(gc, flags);
-
-	return ret;
-}
-
-static const struct irq_domain_ops aic_irq_ops = {
-	.map	= irq_map_generic_chip,
-	.xlate	= aic_irq_domain_xlate,
-};
-
-static void __init at91rm9200_aic_irq_fixup(void)
-{
-	aic_common_rtc_irq_fixup();
-}
-
-static void __init at91sam9260_aic_irq_fixup(void)
-{
-	aic_common_rtt_irq_fixup();
-}
-
-static void __init at91sam9g45_aic_irq_fixup(void)
-{
-	aic_common_rtc_irq_fixup();
-	aic_common_rtt_irq_fixup();
-}
-
-static const struct of_device_id aic_irq_fixups[] __initconst = {
-	{ .compatible = "atmel,at91rm9200", .data = at91rm9200_aic_irq_fixup },
-	{ .compatible = "atmel,at91sam9g45", .data = at91sam9g45_aic_irq_fixup },
-	{ .compatible = "atmel,at91sam9n12", .data = at91rm9200_aic_irq_fixup },
-	{ .compatible = "atmel,at91sam9rl", .data = at91sam9g45_aic_irq_fixup },
-	{ .compatible = "atmel,at91sam9x5", .data = at91rm9200_aic_irq_fixup },
-	{ .compatible = "atmel,at91sam9260", .data = at91sam9260_aic_irq_fixup },
-	{ .compatible = "atmel,at91sam9261", .data = at91sam9260_aic_irq_fixup },
-	{ .compatible = "atmel,at91sam9263", .data = at91sam9260_aic_irq_fixup },
-	{ .compatible = "atmel,at91sam9g20", .data = at91sam9260_aic_irq_fixup },
-	{ /* sentinel */ },
-};
-
-static int __init aic_of_init(struct device_node *node,
-			      struct device_node *parent)
-{
-	struct irq_chip_generic *gc;
-	struct irq_domain *domain;
-
-	if (aic_domain)
-		return -EEXIST;
-
-	domain = aic_common_of_init(node, &aic_irq_ops, "atmel-aic",
-				    NR_AIC_IRQS, aic_irq_fixups);
-	if (IS_ERR(domain))
-		return PTR_ERR(domain);
-
-	aic_domain = domain;
-	gc = irq_get_domain_generic_chip(domain, 0);
-
-	gc->chip_types[0].regs.eoi = AT91_AIC_EOICR;
-	gc->chip_types[0].regs.enable = AT91_AIC_IECR;
-	gc->chip_types[0].regs.disable = AT91_AIC_IDCR;
-	gc->chip_types[0].chip.irq_mask = irq_gc_mask_disable_reg;
-	gc->chip_types[0].chip.irq_unmask = irq_gc_unmask_enable_reg;
-	gc->chip_types[0].chip.irq_retrigger = aic_retrigger;
-	gc->chip_types[0].chip.irq_set_type = aic_set_type;
-	gc->chip_types[0].chip.irq_suspend = aic_suspend;
-	gc->chip_types[0].chip.irq_resume = aic_resume;
-	gc->chip_types[0].chip.irq_pm_shutdown = aic_pm_shutdown;
-
-	aic_hw_init(domain);
-	set_handle_irq(aic_handle);
-
-	return 0;
-}
-IRQCHIP_DECLARE(at91rm9200_aic, "atmel,at91rm9200-aic", aic_of_init);
diff --git a/drivers/irqchip/irq-atmel-aic5.c b/drivers/irqchip/irq-atmel-aic5.c
index 29333497ba10..b902762126b7 100644
--- a/drivers/irqchip/irq-atmel-aic5.c
+++ b/drivers/irqchip/irq-atmel-aic5.c
@@ -80,7 +80,7 @@ aic5_handle(struct pt_regs *regs)
 	if (!irqstat)
 		irq_reg_writel(bgc, 0, AT91_AIC5_EOICR);
 	else
-		handle_domain_irq(aic5_domain, irqnr, regs);
+		ipipe_handle_domain_irq(aic5_domain, irqnr, regs);
 }
 
 static void aic5_mask(struct irq_data *d)
@@ -88,16 +88,18 @@ static void aic5_mask(struct irq_data *d)
 	struct irq_domain *domain = d->domain;
 	struct irq_chip_generic *bgc = irq_get_domain_generic_chip(domain, 0);
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+	unsigned long flags;
 
 	/*
 	 * Disable interrupt on AIC5. We always take the lock of the
 	 * first irq chip as all chips share the same registers.
 	 */
-	irq_gc_lock(bgc);
+	flags = irq_gc_lock(bgc);
+	ipipe_lock_irq(d->irq);
 	irq_reg_writel(gc, d->hwirq, AT91_AIC5_SSR);
 	irq_reg_writel(gc, 1, AT91_AIC5_IDCR);
 	gc->mask_cache &= ~d->mask;
-	irq_gc_unlock(bgc);
+	irq_gc_unlock(bgc, flags);
 }
 
 static void aic5_unmask(struct irq_data *d)
@@ -105,28 +107,59 @@ static void aic5_unmask(struct irq_data *d)
 	struct irq_domain *domain = d->domain;
 	struct irq_chip_generic *bgc = irq_get_domain_generic_chip(domain, 0);
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+	unsigned long flags;
 
 	/*
 	 * Enable interrupt on AIC5. We always take the lock of the
 	 * first irq chip as all chips share the same registers.
 	 */
-	irq_gc_lock(bgc);
+	flags = irq_gc_lock(bgc);
 	irq_reg_writel(gc, d->hwirq, AT91_AIC5_SSR);
 	irq_reg_writel(gc, 1, AT91_AIC5_IECR);
 	gc->mask_cache |= d->mask;
-	irq_gc_unlock(bgc);
+	ipipe_unlock_irq(d->irq);
+	irq_gc_unlock(bgc, flags);
+}
+
+#ifdef CONFIG_IPIPE
+
+static void aic5_hold(struct irq_data *d)
+{
+	struct irq_domain *domain = d->domain;
+	struct irq_domain_chip_generic *dgc = domain->gc;
+	struct irq_chip_generic *gc = dgc->gc[0];
+
+	irq_reg_writel(gc, d->hwirq, AT91_AIC5_SSR);
+	irq_reg_writel(gc, 1, AT91_AIC5_IDCR);
+	irq_reg_writel(gc, 0, AT91_AIC5_EOICR);
+}
+
+static void aic5_release(struct irq_data *d)
+{
+	struct irq_domain *domain = d->domain;
+	struct irq_domain_chip_generic *dgc = domain->gc;
+	struct irq_chip_generic *gc = dgc->gc[0];
+	unsigned long flags;
+
+	flags = irq_gc_lock(gc);
+	irq_reg_writel(gc, d->hwirq, AT91_AIC5_SSR);
+	irq_reg_writel(gc, 1, AT91_AIC5_IECR);
+	irq_gc_unlock(gc, flags);
 }
 
+#endif
+
 static int aic5_retrigger(struct irq_data *d)
 {
 	struct irq_domain *domain = d->domain;
 	struct irq_chip_generic *bgc = irq_get_domain_generic_chip(domain, 0);
+	unsigned long flags;
 
 	/* Enable interrupt on AIC5 */
-	irq_gc_lock(bgc);
+	flags = irq_gc_lock(bgc);
 	irq_reg_writel(bgc, d->hwirq, AT91_AIC5_SSR);
 	irq_reg_writel(bgc, 1, AT91_AIC5_ISCR);
-	irq_gc_unlock(bgc);
+	irq_gc_unlock(bgc, flags);
 
 	return 0;
 }
@@ -135,16 +168,17 @@ static int aic5_set_type(struct irq_data *d, unsigned type)
 {
 	struct irq_domain *domain = d->domain;
 	struct irq_chip_generic *bgc = irq_get_domain_generic_chip(domain, 0);
+	unsigned long flags;
 	unsigned int smr;
 	int ret;
 
-	irq_gc_lock(bgc);
+	flags = irq_gc_lock(bgc);
 	irq_reg_writel(bgc, d->hwirq, AT91_AIC5_SSR);
 	smr = irq_reg_readl(bgc, AT91_AIC5_SMR);
 	ret = aic_common_set_type(d, type, &smr);
 	if (!ret)
 		irq_reg_writel(bgc, smr, AT91_AIC5_SMR);
-	irq_gc_unlock(bgc);
+	irq_gc_unlock(bgc, flags);
 
 	return ret;
 }
@@ -160,6 +194,7 @@ static void aic5_suspend(struct irq_data *d)
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
 	int i;
 	u32 mask;
+	unsigned long flags;
 
 	if (smr_cache)
 		for (i = 0; i < domain->revmap_size; i++) {
@@ -167,7 +202,7 @@ static void aic5_suspend(struct irq_data *d)
 			smr_cache[i] = irq_reg_readl(bgc, AT91_AIC5_SMR);
 		}
 
-	irq_gc_lock(bgc);
+	flags = irq_gc_lock(bgc);
 	for (i = 0; i < dgc->irqs_per_chip; i++) {
 		mask = 1 << i;
 		if ((mask & gc->mask_cache) == (mask & gc->wake_active))
@@ -179,7 +214,7 @@ static void aic5_suspend(struct irq_data *d)
 		else
 			irq_reg_writel(bgc, 1, AT91_AIC5_IDCR);
 	}
-	irq_gc_unlock(bgc);
+	irq_gc_unlock(bgc, flags);
 }
 
 static void aic5_resume(struct irq_data *d)
@@ -190,8 +225,9 @@ static void aic5_resume(struct irq_data *d)
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
 	int i;
 	u32 mask;
+	unsigned long flags;
 
-	irq_gc_lock(bgc);
+	flags = irq_gc_lock(bgc);
 
 	if (smr_cache) {
 		irq_reg_writel(bgc, 0xffffffff, AT91_AIC5_SPU);
@@ -215,7 +251,7 @@ static void aic5_resume(struct irq_data *d)
 		else
 			irq_reg_writel(bgc, 1, AT91_AIC5_IDCR);
 	}
-	irq_gc_unlock(bgc);
+	irq_gc_unlock(bgc, flags);
 }
 
 static void aic5_pm_shutdown(struct irq_data *d)
@@ -224,15 +260,16 @@ static void aic5_pm_shutdown(struct irq_data *d)
 	struct irq_domain_chip_generic *dgc = domain->gc;
 	struct irq_chip_generic *bgc = irq_get_domain_generic_chip(domain, 0);
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+	unsigned long flags;
 	int i;
 
-	irq_gc_lock(bgc);
+	flags = irq_gc_lock(bgc);
 	for (i = 0; i < dgc->irqs_per_chip; i++) {
 		irq_reg_writel(bgc, i + gc->irq_base, AT91_AIC5_SSR);
 		irq_reg_writel(bgc, 1, AT91_AIC5_IDCR);
 		irq_reg_writel(bgc, 1, AT91_AIC5_ICCR);
 	}
-	irq_gc_unlock(bgc);
+	irq_gc_unlock(bgc, flags);
 }
 #else
 #define aic5_suspend		NULL
@@ -350,6 +387,11 @@ static int __init aic5_of_init(struct device_node *node,
 		gc->chip_types[0].chip.irq_suspend = aic5_suspend;
 		gc->chip_types[0].chip.irq_resume = aic5_resume;
 		gc->chip_types[0].chip.irq_pm_shutdown = aic5_pm_shutdown;
+#ifdef CONFIG_IPIPE
+		gc->chip_types[0].chip.irq_hold	= aic5_hold;
+		gc->chip_types[0].chip.irq_release = aic5_release;
+		gc->chip_types[0].chip.flags = IRQCHIP_PIPELINE_SAFE;
+#endif
 	}
 
 	aic5_hw_init(domain);
diff --git a/drivers/irqchip/irq-bcm2835.c b/drivers/irqchip/irq-bcm2835.c
index 418245d31921..3930dafa5d12 100644
--- a/drivers/irqchip/irq-bcm2835.c
+++ b/drivers/irqchip/irq-bcm2835.c
@@ -101,7 +101,12 @@ static void armctrl_unmask_irq(struct irq_data *d)
 static struct irq_chip armctrl_chip = {
 	.name = "ARMCTRL-level",
 	.irq_mask = armctrl_mask_irq,
-	.irq_unmask = armctrl_unmask_irq
+	.irq_unmask = armctrl_unmask_irq,
+#ifdef CONFIG_IPIPE
+	.irq_hold = armctrl_mask_irq,
+	.irq_release = armctrl_unmask_irq,
+#endif
+	.flags	     = IRQCHIP_PIPELINE_SAFE,
 };
 
 static int armctrl_xlate(struct irq_domain *d, struct device_node *ctrlr,
@@ -231,7 +236,7 @@ static void __exception_irq_entry bcm2835_handle_irq(
 	u32 hwirq;
 
 	while ((hwirq = get_next_armctrl_hwirq()) != ~0)
-		handle_domain_irq(intc.domain, hwirq, regs);
+		ipipe_handle_domain_irq(intc.domain, hwirq, regs);
 }
 
 static void bcm2836_chained_handle_irq(struct irq_desc *desc)
@@ -239,7 +244,7 @@ static void bcm2836_chained_handle_irq(struct irq_desc *desc)
 	u32 hwirq;
 
 	while ((hwirq = get_next_armctrl_hwirq()) != ~0)
-		generic_handle_irq(irq_linear_revmap(intc.domain, hwirq));
+		ipipe_handle_demuxed_irq(irq_linear_revmap(intc.domain, hwirq));
 }
 
 IRQCHIP_DECLARE(bcm2835_armctrl_ic, "brcm,bcm2835-armctrl-ic",
diff --git a/drivers/irqchip/irq-bcm2836.c b/drivers/irqchip/irq-bcm2836.c
index 2038693f074c..330ff7925b51 100644
--- a/drivers/irqchip/irq-bcm2836.c
+++ b/drivers/irqchip/irq-bcm2836.c
@@ -39,40 +39,68 @@ static void bcm2836_arm_irqchip_unmask_per_cpu_irq(unsigned int reg_offset,
 	writel(readl(reg) | BIT(bit), reg);
 }
 
-static void bcm2836_arm_irqchip_mask_timer_irq(struct irq_data *d)
+static void __bcm2836_arm_irqchip_mask_timer_irq(struct irq_data *d)
 {
 	bcm2836_arm_irqchip_mask_per_cpu_irq(LOCAL_TIMER_INT_CONTROL0,
 					     d->hwirq - LOCAL_IRQ_CNTPSIRQ,
-					     smp_processor_id());
+					     raw_smp_processor_id());
 }
 
-static void bcm2836_arm_irqchip_unmask_timer_irq(struct irq_data *d)
+static void bcm2836_arm_irqchip_mask_timer_irq(struct irq_data *d)
+{
+	unsigned long flags;
+
+	flags = hard_local_irq_save();
+	__bcm2836_arm_irqchip_mask_timer_irq(d);
+	hard_local_irq_restore(flags);
+}
+
+static void __bcm2836_arm_irqchip_unmask_timer_irq(struct irq_data *d)
 {
 	bcm2836_arm_irqchip_unmask_per_cpu_irq(LOCAL_TIMER_INT_CONTROL0,
 					       d->hwirq - LOCAL_IRQ_CNTPSIRQ,
-					       smp_processor_id());
+					       raw_smp_processor_id());
+}
+
+static void bcm2836_arm_irqchip_unmask_timer_irq(struct irq_data *d)
+{
+	unsigned long flags;
+
+	flags = hard_local_irq_save();
+	__bcm2836_arm_irqchip_unmask_timer_irq(d);
+	hard_local_irq_restore(flags);
 }
 
 static struct irq_chip bcm2836_arm_irqchip_timer = {
 	.name		= "bcm2836-timer",
 	.irq_mask	= bcm2836_arm_irqchip_mask_timer_irq,
 	.irq_unmask	= bcm2836_arm_irqchip_unmask_timer_irq,
+#ifdef CONFIG_IPIPE
+	.irq_hold	= __bcm2836_arm_irqchip_mask_timer_irq,
+	.irq_release	= __bcm2836_arm_irqchip_unmask_timer_irq,
+#endif
+	.flags		= IRQCHIP_PIPELINE_SAFE,
 };
 
 static void bcm2836_arm_irqchip_mask_pmu_irq(struct irq_data *d)
 {
-	writel(1 << smp_processor_id(), intc.base + LOCAL_PM_ROUTING_CLR);
+	writel(1 << raw_smp_processor_id(), intc.base + LOCAL_PM_ROUTING_CLR);
 }
 
 static void bcm2836_arm_irqchip_unmask_pmu_irq(struct irq_data *d)
 {
-	writel(1 << smp_processor_id(), intc.base + LOCAL_PM_ROUTING_SET);
+	writel(1 << raw_smp_processor_id(), intc.base + LOCAL_PM_ROUTING_SET);
 }
 
 static struct irq_chip bcm2836_arm_irqchip_pmu = {
 	.name		= "bcm2836-pmu",
 	.irq_mask	= bcm2836_arm_irqchip_mask_pmu_irq,
 	.irq_unmask	= bcm2836_arm_irqchip_unmask_pmu_irq,
+#ifdef CONFIG_IPIPE
+	.irq_hold	= bcm2836_arm_irqchip_mask_pmu_irq,
+	.irq_release	= bcm2836_arm_irqchip_unmask_pmu_irq,
+#endif
+	.flags		= IRQCHIP_PIPELINE_SAFE,
 };
 
 static void bcm2836_arm_irqchip_mask_gpu_irq(struct irq_data *d)
@@ -87,6 +115,11 @@ static struct irq_chip bcm2836_arm_irqchip_gpu = {
 	.name		= "bcm2836-gpu",
 	.irq_mask	= bcm2836_arm_irqchip_mask_gpu_irq,
 	.irq_unmask	= bcm2836_arm_irqchip_unmask_gpu_irq,
+#ifdef CONFIG_IPIPE
+	.irq_hold	= bcm2836_arm_irqchip_mask_gpu_irq,
+	.irq_release	= bcm2836_arm_irqchip_unmask_gpu_irq,
+#endif
+	.flags		= IRQCHIP_PIPELINE_SAFE,
 };
 
 static int bcm2836_map(struct irq_domain *d, unsigned int irq,
@@ -123,7 +156,7 @@ static int bcm2836_map(struct irq_domain *d, unsigned int irq,
 static void
 __exception_irq_entry bcm2836_arm_irqchip_handle_irq(struct pt_regs *regs)
 {
-	int cpu = smp_processor_id();
+	int cpu = raw_smp_processor_id();
 	u32 stat;
 
 	stat = readl_relaxed(intc.base + LOCAL_IRQ_PENDING0 + 4 * cpu);
@@ -135,12 +168,12 @@ __exception_irq_entry bcm2836_arm_irqchip_handle_irq(struct pt_regs *regs)
 		u32 ipi = ffs(mbox_val) - 1;
 
 		writel(1 << ipi, mailbox0);
-		handle_IPI(ipi, regs);
+		ipipe_handle_multi_ipi(ipi, regs);
 #endif
 	} else if (stat) {
 		u32 hwirq = ffs(stat) - 1;
 
-		handle_domain_irq(intc.domain, hwirq, regs);
+		ipipe_handle_domain_irq(intc.domain, hwirq, regs);
 	}
 }
 
diff --git a/drivers/irqchip/irq-bcm7120-l2.c b/drivers/irqchip/irq-bcm7120-l2.c
index 586df3587be0..47b7524890e3 100644
--- a/drivers/irqchip/irq-bcm7120-l2.c
+++ b/drivers/irqchip/irq-bcm7120-l2.c
@@ -58,6 +58,7 @@ static void bcm7120_l2_intc_irq_handle(struct irq_desc *desc)
 	struct bcm7120_l2_intc_data *b = data->b;
 	struct irq_chip *chip = irq_desc_get_chip(desc);
 	unsigned int idx;
+	unsigned long flags;
 
 	chained_irq_enter(chip, desc);
 
@@ -68,11 +69,11 @@ static void bcm7120_l2_intc_irq_handle(struct irq_desc *desc)
 		unsigned long pending;
 		int hwirq;
 
-		irq_gc_lock(gc);
+		flags = irq_gc_lock(gc);
 		pending = irq_reg_readl(gc, b->stat_offset[idx]) &
 					    gc->mask_cache &
 					    data->irq_map_mask[idx];
-		irq_gc_unlock(gc);
+		irq_gc_unlock(gc, flags);
 
 		for_each_set_bit(hwirq, &pending, IRQS_PER_WORD) {
 			generic_handle_irq(irq_find_mapping(b->domain,
@@ -87,22 +88,24 @@ static void bcm7120_l2_intc_suspend(struct irq_chip_generic *gc)
 {
 	struct bcm7120_l2_intc_data *b = gc->private;
 	struct irq_chip_type *ct = gc->chip_types;
+        unsigned long flags;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	if (b->can_wake)
 		irq_reg_writel(gc, gc->mask_cache | gc->wake_active,
 			       ct->regs.mask);
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 
 static void bcm7120_l2_intc_resume(struct irq_chip_generic *gc)
 {
 	struct irq_chip_type *ct = gc->chip_types;
+        unsigned long flags;
 
 	/* Restore the saved mask */
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	irq_reg_writel(gc, gc->mask_cache, ct->regs.mask);
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 
 static int bcm7120_l2_intc_init_one(struct device_node *dn,
diff --git a/drivers/irqchip/irq-brcmstb-l2.c b/drivers/irqchip/irq-brcmstb-l2.c
index 0298ede67e51..c7cb625e45f5 100644
--- a/drivers/irqchip/irq-brcmstb-l2.c
+++ b/drivers/irqchip/irq-brcmstb-l2.c
@@ -123,7 +123,7 @@ static void brcmstb_l2_intc_suspend(struct irq_data *d)
 	struct brcmstb_l2_intc_data *b = gc->private;
 	unsigned long flags;
 
-	irq_gc_lock_irqsave(gc, flags);
+	flags = irq_gc_lock(gc);
 	/* Save the current mask */
 	b->saved_mask = irq_reg_readl(gc, ct->regs.mask);
 
@@ -132,7 +132,7 @@ static void brcmstb_l2_intc_suspend(struct irq_data *d)
 		irq_reg_writel(gc, ~gc->wake_active, ct->regs.disable);
 		irq_reg_writel(gc, gc->wake_active, ct->regs.enable);
 	}
-	irq_gc_unlock_irqrestore(gc, flags);
+	irq_gc_unlock(gc, flags);
 }
 
 static void brcmstb_l2_intc_resume(struct irq_data *d)
@@ -142,7 +142,7 @@ static void brcmstb_l2_intc_resume(struct irq_data *d)
 	struct brcmstb_l2_intc_data *b = gc->private;
 	unsigned long flags;
 
-	irq_gc_lock_irqsave(gc, flags);
+	flags = irq_gc_lock(gc);
 	if (ct->chip.irq_ack) {
 		/* Clear unmasked non-wakeup interrupts */
 		irq_reg_writel(gc, ~b->saved_mask & ~gc->wake_active,
@@ -152,7 +152,7 @@ static void brcmstb_l2_intc_resume(struct irq_data *d)
 	/* Restore the saved mask */
 	irq_reg_writel(gc, b->saved_mask, ct->regs.disable);
 	irq_reg_writel(gc, ~b->saved_mask, ct->regs.enable);
-	irq_gc_unlock_irqrestore(gc, flags);
+	irq_gc_unlock(gc, flags);
 }
 
 static int __init brcmstb_l2_intc_of_init(struct device_node *np,
diff --git a/drivers/irqchip/irq-crossbar.c b/drivers/irqchip/irq-crossbar.c
index a05a7501e107..9c43dc674335 100644
--- a/drivers/irqchip/irq-crossbar.c
+++ b/drivers/irqchip/irq-crossbar.c
@@ -12,6 +12,7 @@
 #include <linux/of_address.h>
 #include <linux/of_irq.h>
 #include <linux/slab.h>
+#include <linux/ipipe.h>
 
 #define IRQ_FREE	-1
 #define IRQ_RESERVED	-2
@@ -65,10 +66,15 @@ static struct irq_chip crossbar_chip = {
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
 	.irq_set_type		= irq_chip_set_type_parent,
 	.flags			= IRQCHIP_MASK_ON_SUSPEND |
-				  IRQCHIP_SKIP_SET_WAKE,
+				  IRQCHIP_SKIP_SET_WAKE |
+				  IRQCHIP_PIPELINE_SAFE,
 #ifdef CONFIG_SMP
 	.irq_set_affinity	= irq_chip_set_affinity_parent,
 #endif
+#ifdef CONFIG_IPIPE
+	.irq_hold		= irq_chip_hold_parent,
+	.irq_release		= irq_chip_release_parent,
+#endif
 };
 
 static int allocate_gic_irq(struct irq_domain *domain, unsigned virq,
diff --git a/drivers/irqchip/irq-dw-apb-ictl.c b/drivers/irqchip/irq-dw-apb-ictl.c
index e4550e9c810b..01923eca46e5 100644
--- a/drivers/irqchip/irq-dw-apb-ictl.c
+++ b/drivers/irqchip/irq-dw-apb-ictl.c
@@ -17,6 +17,7 @@
 #include <linux/irqchip/chained_irq.h>
 #include <linux/of_address.h>
 #include <linux/of_irq.h>
+#include <linux/ipipe.h>
 
 #define APB_INT_ENABLE_L	0x00
 #define APB_INT_ENABLE_H	0x04
@@ -42,7 +43,7 @@ static void dw_apb_ictl_handler(struct irq_desc *desc)
 			u32 hwirq = ffs(stat) - 1;
 			u32 virq = irq_find_mapping(d, gc->irq_base + hwirq);
 
-			generic_handle_irq(virq);
+			ipipe_handle_demuxed_irq(virq);
 			stat &= ~(1 << hwirq);
 		}
 	}
@@ -55,11 +56,12 @@ static void dw_apb_ictl_resume(struct irq_data *d)
 {
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
 	struct irq_chip_type *ct = irq_data_get_chip_type(d);
+	unsigned long flags;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	writel_relaxed(~0, gc->reg_base + ct->regs.enable);
 	writel_relaxed(*ct->mask_cache, gc->reg_base + ct->regs.mask);
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 #else
 #define dw_apb_ictl_resume	NULL
@@ -144,6 +146,7 @@ static int __init dw_apb_ictl_init(struct device_node *np,
 		gc->chip_types[0].chip.irq_mask = irq_gc_mask_set_bit;
 		gc->chip_types[0].chip.irq_unmask = irq_gc_mask_clr_bit;
 		gc->chip_types[0].chip.irq_resume = dw_apb_ictl_resume;
+		gc->chip_types[0].chip.flags |= IRQCHIP_PIPELINE_SAFE;
 	}
 
 	irq_set_chained_handler_and_data(irq, dw_apb_ictl_handler, domain);
diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index e88e75c22b6a..ca18130d9818 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -72,14 +72,22 @@ struct v2m_data {
 
 static void gicv2m_mask_msi_irq(struct irq_data *d)
 {
+	unsigned long flags;
+
+	flags = hard_cond_local_irq_save();
 	pci_msi_mask_irq(d);
 	irq_chip_mask_parent(d);
+	hard_cond_local_irq_restore(flags);
 }
 
 static void gicv2m_unmask_msi_irq(struct irq_data *d)
 {
+	unsigned long flags;
+
+	flags = hard_cond_local_irq_save();
 	pci_msi_unmask_irq(d);
 	irq_chip_unmask_parent(d);
+	hard_cond_local_irq_restore(flags);
 }
 
 static struct irq_chip gicv2m_msi_irq_chip = {
@@ -88,6 +96,11 @@ static struct irq_chip gicv2m_msi_irq_chip = {
 	.irq_unmask		= gicv2m_unmask_msi_irq,
 	.irq_eoi		= irq_chip_eoi_parent,
 	.irq_write_msi_msg	= pci_msi_domain_write_msg,
+#ifdef CONFIG_IPIPE
+	.irq_hold		= irq_chip_hold_parent,
+	.irq_release		= irq_chip_release_parent,
+#endif
+	.flags			= IRQCHIP_PIPELINE_SAFE,
 };
 
 static struct msi_domain_info gicv2m_msi_domain_info = {
@@ -129,6 +142,11 @@ static struct irq_chip gicv2m_irq_chip = {
 	.irq_eoi		= irq_chip_eoi_parent,
 	.irq_set_affinity	= irq_chip_set_affinity_parent,
 	.irq_compose_msi_msg	= gicv2m_compose_msi_msg,
+#ifdef CONFIG_IPIPE
+	.irq_hold		= irq_chip_hold_parent,
+	.irq_release		= irq_chip_release_parent,
+#endif
+	.flags			= IRQCHIP_PIPELINE_SAFE,
 };
 
 static int gicv2m_irq_gic_domain_alloc(struct irq_domain *domain,
@@ -251,6 +269,7 @@ static bool is_msi_spi_valid(u32 base, u32 num)
 
 static struct irq_chip gicv2m_pmsi_irq_chip = {
 	.name			= "pMSI",
+	.flags			= IRQCHIP_PIPELINE_SAFE,
 };
 
 static struct msi_domain_ops gicv2m_pmsi_ops = {
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 446603efbc90..a471d50d84f9 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -328,7 +328,12 @@ static void gic_poke_irq(struct irq_data *d, u32 offset)
 
 static void gic_mask_irq(struct irq_data *d)
 {
+	unsigned long flags;
+
+	flags = hard_cond_local_irq_save();
+	ipipe_lock_irq(d->irq);
 	gic_poke_irq(d, GICD_ICENABLER);
+	hard_cond_local_irq_restore(flags);
 }
 
 static void gic_eoimode1_mask_irq(struct irq_data *d)
@@ -348,7 +353,12 @@ static void gic_eoimode1_mask_irq(struct irq_data *d)
 
 static void gic_unmask_irq(struct irq_data *d)
 {
+	unsigned long flags;
+
+	flags = hard_cond_local_irq_save();
 	gic_poke_irq(d, GICD_ISENABLER);
+	ipipe_unlock_irq(d->irq);
+	hard_cond_local_irq_restore(flags);
 }
 
 static inline bool gic_supports_nmi(void)
@@ -520,6 +530,27 @@ static void gic_eoimode1_eoi_irq(struct irq_data *d)
 	gic_write_dir(gic_irq(d));
 }
 
+#ifdef CONFIG_IPIPE
+static void gic_hold_irq(struct irq_data *d)
+{
+	struct irq_chip *chip = irq_data_get_irq_chip(d);
+
+	gic_poke_irq(d, GICD_ICENABLER);
+
+	if (chip->irq_eoi == gic_eoimode1_eoi_irq) {
+		if (irqd_is_forwarded_to_vcpu(d))
+			gic_poke_irq(d, GICD_ICACTIVER);
+		gic_eoimode1_eoi_irq(d);
+} else
+		gic_eoi_irq(d);
+}
+
+static void gic_release_irq(struct irq_data *d)
+{
+	gic_poke_irq(d, GICD_ISENABLER);
+}
+#endif /* CONFIG_IPIPE */
+
 static int gic_set_type(struct irq_data *d, unsigned int type)
 {
 	enum gic_intid_range range;
@@ -645,7 +676,7 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
 		else
 			isb();
 
-		err = handle_domain_irq(gic_data.domain, irqnr, regs);
+		err = ipipe_handle_domain_irq(gic_data.domain, irqnr, regs);
 		if (err) {
 			WARN_ONCE(true, "Unexpected interrupt received!\n");
 			gic_deactivate_unhandled(irqnr);
@@ -664,7 +695,7 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
 		 * that any shared data read by handle_IPI will
 		 * be read after the ACK.
 		 */
-		handle_IPI(irqnr, regs);
+		pipe_handle_multi_ipi(irqnr, regs);
 #else
 		WARN_ONCE(true, "Unexpected SGI received!\n");
 #endif
@@ -1208,6 +1239,10 @@ static struct irq_chip gic_chip = {
 	.irq_unmask		= gic_unmask_irq,
 	.irq_eoi		= gic_eoi_irq,
 	.irq_set_type		= gic_set_type,
+#ifdef CONFIG_IPIPE
+	.irq_hold		= gic_hold_irq,
+	.irq_release		= gic_release_irq,
+#endif
 	.irq_set_affinity	= gic_set_affinity,
 	.irq_get_irqchip_state	= gic_irq_get_irqchip_state,
 	.irq_set_irqchip_state	= gic_irq_set_irqchip_state,
@@ -1215,6 +1250,7 @@ static struct irq_chip gic_chip = {
 	.irq_nmi_teardown	= gic_irq_nmi_teardown,
 	.flags			= IRQCHIP_SET_TYPE_MASKED |
 				  IRQCHIP_SKIP_SET_WAKE |
+				  IRQCHIP_PIPELINE_SAFE |
 				  IRQCHIP_MASK_ON_SUSPEND,
 };
 
@@ -1224,6 +1260,10 @@ static struct irq_chip gic_eoimode1_chip = {
 	.irq_unmask		= gic_unmask_irq,
 	.irq_eoi		= gic_eoimode1_eoi_irq,
 	.irq_set_type		= gic_set_type,
+#ifdef CONFIG_IPIPE
+	.irq_hold		= gic_hold_irq,
+	.irq_release		= gic_release_irq,
+#endif
 	.irq_set_affinity	= gic_set_affinity,
 	.irq_get_irqchip_state	= gic_irq_get_irqchip_state,
 	.irq_set_irqchip_state	= gic_irq_set_irqchip_state,
@@ -1232,6 +1272,7 @@ static struct irq_chip gic_eoimode1_chip = {
 	.irq_nmi_teardown	= gic_irq_nmi_teardown,
 	.flags			= IRQCHIP_SET_TYPE_MASKED |
 				  IRQCHIP_SKIP_SET_WAKE |
+				  IRQCHIP_PIPELINE_SAFE |
 				  IRQCHIP_MASK_ON_SUSPEND,
 };
 
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 882204d1ef4f..156e94ee1158 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -35,6 +35,7 @@
 #include <linux/interrupt.h>
 #include <linux/percpu.h>
 #include <linux/slab.h>
+#include <linux/ipipe.h>
 #include <linux/irqchip.h>
 #include <linux/irqchip/chained_irq.h>
 #include <linux/irqchip/arm-gic.h>
@@ -88,9 +89,17 @@ struct gic_chip_data {
 #endif
 };
 
+#ifdef CONFIG_IPIPE
+#define pipeline_lock(__flags)		do { (__flags) = hard_local_irq_save(); } while (0)
+#define pipeline_unlock(__flags)	hard_local_irq_restore(__flags)
+#else
+#define pipeline_lock(__flags)		do { (void)__flags; } while (0)
+#define pipeline_unlock(__flags)	do { (void)__flags; } while (0)
+#endif
+
 #ifdef CONFIG_BL_SWITCHER
 
-static DEFINE_RAW_SPINLOCK(cpu_map_lock);
+static IPIPE_DEFINE_RAW_SPINLOCK(cpu_map_lock);
 
 #define gic_lock_irqsave(f)		\
 	raw_spin_lock_irqsave(&cpu_map_lock, (f))
@@ -201,7 +210,12 @@ static int gic_peek_irq(struct irq_data *d, u32 offset)
 
 static void gic_mask_irq(struct irq_data *d)
 {
+	unsigned long flags;
+
+	pipeline_lock(flags);
+	ipipe_lock_irq(d->irq);
 	gic_poke_irq(d, GIC_DIST_ENABLE_CLEAR);
+	pipeline_unlock(flags);
 }
 
 static void gic_eoimode1_mask_irq(struct irq_data *d)
@@ -221,7 +235,12 @@ static void gic_eoimode1_mask_irq(struct irq_data *d)
 
 static void gic_unmask_irq(struct irq_data *d)
 {
+	unsigned long flags;
+
+	pipeline_lock(flags);
 	gic_poke_irq(d, GIC_DIST_ENABLE_SET);
+	ipipe_unlock_irq(d->irq);
+	pipeline_unlock(flags);
 }
 
 static void gic_eoi_irq(struct irq_data *d)
@@ -238,6 +257,27 @@ static void gic_eoimode1_eoi_irq(struct irq_data *d)
 	writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_DEACTIVATE);
 }
 
+#ifdef CONFIG_IPIPE
+static void gic_hold_irq(struct irq_data *d)
+{
+	struct irq_chip *chip = irq_data_get_irq_chip(d);
+
+	gic_poke_irq(d, GIC_DIST_ENABLE_CLEAR);
+
+	if (chip->irq_eoi == gic_eoimode1_eoi_irq) {
+		if (irqd_is_forwarded_to_vcpu(d))
+			gic_poke_irq(d, GIC_DIST_ACTIVE_CLEAR);
+		gic_eoimode1_eoi_irq(d);
+	} else
+		gic_eoi_irq(d);
+}
+
+static void gic_release_irq(struct irq_data *d)
+{
+	gic_poke_irq(d, GIC_DIST_ENABLE_SET);
+}
+#endif /* CONFIG_IPIPE */
+
 static int gic_irq_set_irqchip_state(struct irq_data *d,
 				     enum irqchip_irq_state which, bool val)
 {
@@ -361,7 +401,7 @@ static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
 			if (static_branch_likely(&supports_deactivate_key))
 				writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
 			isb();
-			handle_domain_irq(gic->domain, irqnr, regs);
+			ipipe_handle_domain_irq(gic->domain, irqnr, regs);
 			continue;
 		}
 		if (irqnr < 16) {
@@ -377,7 +417,7 @@ static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
 			 * Pairs with the write barrier in gic_raise_softirq
 			 */
 			smp_rmb();
-			handle_IPI(irqnr, regs);
+			ipipe_handle_multi_ipi(irqnr, regs);
 #endif
 			continue;
 		}
@@ -405,7 +445,7 @@ static void gic_handle_cascade_irq(struct irq_desc *desc)
 		handle_bad_irq(desc);
 	} else {
 		isb();
-		generic_handle_irq(cascade_irq);
+		ipipe_handle_demuxed_irq(cascade_irq);
 	}
 
  out:
@@ -417,11 +457,16 @@ static const struct irq_chip gic_chip = {
 	.irq_unmask		= gic_unmask_irq,
 	.irq_eoi		= gic_eoi_irq,
 	.irq_set_type		= gic_set_type,
+#ifdef CONFIG_IPIPE
+	.irq_hold		= gic_hold_irq,
+	.irq_release		= gic_release_irq,
+#endif
 	.irq_get_irqchip_state	= gic_irq_get_irqchip_state,
 	.irq_set_irqchip_state	= gic_irq_set_irqchip_state,
 	.flags			= IRQCHIP_SET_TYPE_MASKED |
 				  IRQCHIP_SKIP_SET_WAKE |
-				  IRQCHIP_MASK_ON_SUSPEND,
+				  IRQCHIP_MASK_ON_SUSPEND |
+				  IRQCHIP_PIPELINE_SAFE,
 };
 
 void __init gic_cascade_irq(unsigned int gic_nr, unsigned int irq)
@@ -479,7 +524,6 @@ static void gic_cpu_if_up(struct gic_chip_data *gic)
 	writel_relaxed(bypass | mode | GICC_ENABLE, cpu_base + GIC_CPU_CTRL);
 }
 
-
 static void gic_dist_init(struct gic_chip_data *gic)
 {
 	unsigned int i;
diff --git a/drivers/irqchip/irq-imx-gpcv2.c b/drivers/irqchip/irq-imx-gpcv2.c
index 4f74c15c4755..422fa2198e07 100644
--- a/drivers/irqchip/irq-imx-gpcv2.c
+++ b/drivers/irqchip/irq-imx-gpcv2.c
@@ -7,6 +7,7 @@
 #include <linux/of_irq.h>
 #include <linux/slab.h>
 #include <linux/irqchip.h>
+#include <linux/ipipe.h>
 #include <linux/syscore_ops.h>
 
 #define IMR_NUM			4
@@ -19,7 +20,11 @@
 
 
 struct gpcv2_irqchip_data {
+#ifdef CONFIG_IPIPE
+	ipipe_spinlock_t	rlock;
+#else
 	struct raw_spinlock	rlock;
+#endif
 	void __iomem		*gpc_base;
 	u32			wakeup_sources[IMR_NUM];
 	u32			saved_irq_mask[IMR_NUM];
@@ -36,6 +41,7 @@ static void __iomem *gpcv2_idx_to_reg(struct gpcv2_irqchip_data *cd, int i)
 static int gpcv2_wakeup_source_save(void)
 {
 	struct gpcv2_irqchip_data *cd;
+	unsigned long flags;
 	void __iomem *reg;
 	int i;
 
@@ -45,8 +51,10 @@ static int gpcv2_wakeup_source_save(void)
 
 	for (i = 0; i < IMR_NUM; i++) {
 		reg = gpcv2_idx_to_reg(cd, i);
+		flags = hard_cond_local_irq_save();
 		cd->saved_irq_mask[i] = readl_relaxed(reg);
 		writel_relaxed(cd->wakeup_sources[i], reg);
+		hard_cond_local_irq_restore(flags);
 	}
 
 	return 0;
@@ -55,14 +63,18 @@ static int gpcv2_wakeup_source_save(void)
 static void gpcv2_wakeup_source_restore(void)
 {
 	struct gpcv2_irqchip_data *cd;
+	unsigned long flags;
 	int i;
 
 	cd = imx_gpcv2_instance;
 	if (!cd)
 		return;
 
-	for (i = 0; i < IMR_NUM; i++)
+	for (i = 0; i < IMR_NUM; i++) {
+		flags = hard_cond_local_irq_save();
 		writel_relaxed(cd->saved_irq_mask[i], gpcv2_idx_to_reg(cd, i));
+		hard_cond_local_irq_restore(flags);
+	}
 }
 
 static struct syscore_ops imx_gpcv2_syscore_ops = {
@@ -92,38 +104,81 @@ static int imx_gpcv2_irq_set_wake(struct irq_data *d, unsigned int on)
 	return 0;
 }
 
-static void imx_gpcv2_irq_unmask(struct irq_data *d)
+static void __imx_gpcv2_irq_unmask(struct irq_data *d)
 {
 	struct gpcv2_irqchip_data *cd = d->chip_data;
 	void __iomem *reg;
 	u32 val;
 
-	raw_spin_lock(&cd->rlock);
 	reg = gpcv2_idx_to_reg(cd, d->hwirq / 32);
 	val = readl_relaxed(reg);
 	val &= ~BIT(d->hwirq % 32);
 	writel_relaxed(val, reg);
-	raw_spin_unlock(&cd->rlock);
 
 	irq_chip_unmask_parent(d);
 }
 
-static void imx_gpcv2_irq_mask(struct irq_data *d)
+static void imx_gpcv2_irq_unmask(struct irq_data *d)
+{
+	struct gpcv2_irqchip_data *cd = d->chip_data;
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&cd->rlock, flags);
+	__imx_gpcv2_irq_unmask(d);
+	raw_spin_unlock_irqrestore(&cd->rlock, flags);
+	irq_chip_unmask_parent(d);
+}
+
+static void __imx_gpcv2_irq_mask(struct irq_data *d)
 {
 	struct gpcv2_irqchip_data *cd = d->chip_data;
 	void __iomem *reg;
 	u32 val;
 
-	raw_spin_lock(&cd->rlock);
 	reg = gpcv2_idx_to_reg(cd, d->hwirq / 32);
 	val = readl_relaxed(reg);
 	val |= BIT(d->hwirq % 32);
 	writel_relaxed(val, reg);
-	raw_spin_unlock(&cd->rlock);
 
 	irq_chip_mask_parent(d);
 }
 
+static void imx_gpcv2_irq_mask(struct irq_data *d)
+{
+	struct gpcv2_irqchip_data *cd = d->chip_data;
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&cd->rlock, flags);
+	__imx_gpcv2_irq_mask(d);
+	raw_spin_unlock_irqrestore(&cd->rlock, flags);
+	irq_chip_mask_parent(d);
+}
+
+#ifdef CONFIG_IPIPE
+
+static void imx_gpc_hold_irq(struct irq_data *d)
+{
+	struct gpcv2_irqchip_data *cd = d->chip_data;
+
+	raw_spin_lock(&cd->rlock);
+	__imx_gpcv2_irq_mask(d);
+	raw_spin_unlock(&cd->rlock);
+	irq_chip_hold_parent(d);
+}
+
+static void imx_gpc_release_irq(struct irq_data *d)
+{
+	struct gpcv2_irqchip_data *cd = d->chip_data;
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&cd->rlock, flags);
+	__imx_gpcv2_irq_unmask(d);
+	raw_spin_unlock_irqrestore(&cd->rlock, flags);
+	irq_chip_release_parent(d);
+}
+
+#endif /* CONFIG_IPIPE */
+
 static struct irq_chip gpcv2_irqchip_data_chip = {
 	.name			= "GPCv2",
 	.irq_eoi		= irq_chip_eoi_parent,
@@ -135,6 +190,11 @@ static struct irq_chip gpcv2_irqchip_data_chip = {
 #ifdef CONFIG_SMP
 	.irq_set_affinity	= irq_chip_set_affinity_parent,
 #endif
+#ifdef CONFIG_IPIPE
+	.irq_hold		= imx_gpc_hold_irq,
+	.irq_release		= imx_gpc_release_irq,
+#endif
+	.flags			= IRQCHIP_PIPELINE_SAFE,
 };
 
 static int imx_gpcv2_domain_translate(struct irq_domain *d,
diff --git a/drivers/irqchip/irq-omap-intc.c b/drivers/irqchip/irq-omap-intc.c
index d360a6eddd6d..1fbcda774458 100644
--- a/drivers/irqchip/irq-omap-intc.c
+++ b/drivers/irqchip/irq-omap-intc.c
@@ -15,6 +15,7 @@
 #include <linux/init.h>
 #include <linux/interrupt.h>
 #include <linux/io.h>
+#include <asm/ipipe.h>
 
 #include <asm/exception.h>
 #include <linux/irqchip.h>
@@ -39,6 +40,7 @@
 #define INTC_MIR_CLEAR0		0x0088
 #define INTC_MIR_SET0		0x008c
 #define INTC_PENDING_IRQ0	0x0098
+#define INTC_PRIO               0x0100
 #define INTC_PENDING_IRQ1	0x00b8
 #define INTC_PENDING_IRQ2	0x00d8
 #define INTC_PENDING_IRQ3	0x00f8
@@ -49,6 +51,12 @@
 #define INTCPS_NR_ILR_REGS	128
 #define INTCPS_NR_MIR_REGS	4
 
+#if !defined(MULTI_OMAP1) && !defined(MULTI_OMAP2)
+#define inline_single inline
+#else
+#define inline_single
+#endif
+
 #define INTC_IDLE_FUNCIDLE	(1 << 0)
 #define INTC_IDLE_TURBO		(1 << 1)
 
@@ -69,12 +77,12 @@ static void __iomem *omap_irq_base;
 static int omap_nr_pending;
 static int omap_nr_irqs;
 
-static void intc_writel(u32 reg, u32 val)
+static inline_single void intc_writel(u32 reg, u32 val)
 {
 	writel_relaxed(val, omap_irq_base + reg);
 }
 
-static u32 intc_readl(u32 reg)
+static inline_single u32 intc_readl(u32 reg)
 {
 	return readl_relaxed(omap_irq_base + reg);
 }
@@ -137,9 +145,10 @@ void omap3_intc_resume_idle(void)
 }
 
 /* XXX: FIQ and additional INTC support (only MPU at the moment) */
-static void omap_ack_irq(struct irq_data *d)
+static inline_single void omap_ack_irq(struct irq_data *d)
 {
 	intc_writel(INTC_CONTROL, 0x1);
+	dsb();
 }
 
 static void omap_mask_ack_irq(struct irq_data *d)
@@ -164,8 +173,14 @@ static void __init omap_irq_soft_reset(void)
 	while (!(intc_readl(INTC_SYSSTATUS) & 0x1))
 		/* Wait for reset to complete */;
 
+#ifndef CONFIG_IPIPE
 	/* Enable autoidle */
 	intc_writel(INTC_SYSCONFIG, 1 << 0);
+#else /* CONFIG_IPIPE */
+	/* Disable autoidle */
+	intc_writel(INTC_SYSCONFIG, 0);
+	intc_writel(INTC_IDLE, 0x1);
+#endif /* CONFIG_IPIPE */
 }
 
 int omap_irq_pending(void)
@@ -211,7 +226,7 @@ static int __init omap_alloc_gc_of(struct irq_domain *d, void __iomem *base)
 		ct->chip.irq_mask = irq_gc_mask_disable_reg;
 		ct->chip.irq_unmask = irq_gc_unmask_enable_reg;
 
-		ct->chip.flags |= IRQCHIP_SKIP_SET_WAKE;
+		ct->chip.flags |= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_PIPELINE_SAFE;
 
 		ct->regs.enable = INTC_MIR_CLEAR0 + 32 * i;
 		ct->regs.disable = INTC_MIR_SET0 + 32 * i;
@@ -231,8 +246,11 @@ static void __init omap_alloc_gc_legacy(void __iomem *base,
 	ct = gc->chip_types;
 	ct->chip.irq_ack = omap_mask_ack_irq;
 	ct->chip.irq_mask = irq_gc_mask_disable_reg;
+#ifdef CONFIG_IPIPE
+	ct->chip.irq_mask_ack = omap_mask_ack_irq;
+#endif
 	ct->chip.irq_unmask = irq_gc_unmask_enable_reg;
-	ct->chip.flags |= IRQCHIP_SKIP_SET_WAKE;
+	ct->chip.flags |= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_PIPELINE_SAFE;
 
 	ct->regs.enable = INTC_MIR_CLEAR0;
 	ct->regs.disable = INTC_MIR_SET0;
@@ -357,7 +375,7 @@ omap_intc_handle_irq(struct pt_regs *regs)
 	}
 
 	irqnr &= ACTIVEIRQ_MASK;
-	handle_domain_irq(domain, irqnr, regs);
+	ipipe_handle_domain_irq(domain, irqnr, regs);
 }
 
 static int __init intc_of_init(struct device_node *node,
@@ -387,6 +405,28 @@ static int __init intc_of_init(struct device_node *node,
 	return 0;
 }
 
+#if defined(CONFIG_IPIPE) && defined(CONFIG_ARCH_OMAP2PLUS)
+#if defined(CONFIG_ARCH_OMAP3) || defined(CONFIG_SOC_AM33XX)
+void omap3_intc_mute(void)
+{
+	intc_writel(INTC_THRESHOLD, 0x1);
+	intc_writel(INTC_CONTROL, 0x1);
+}
+
+void omap3_intc_unmute(void)
+{
+	intc_writel(INTC_THRESHOLD, 0xff);
+}
+
+void omap3_intc_set_irq_prio(int irq, int hi)
+{
+	if (irq >= INTCPS_NR_MIR_REGS * 32)
+		return;
+	intc_writel(INTC_PRIO + 4 * irq, hi ? 0 : 0xfc);
+}
+#endif /* CONFIG_ARCH_OMAP3 */
+#endif /* CONFIG_IPIPE && ARCH_OMAP2PLUS */
+
 IRQCHIP_DECLARE(omap2_intc, "ti,omap2-intc", intc_of_init);
 IRQCHIP_DECLARE(omap3_intc, "ti,omap3-intc", intc_of_init);
 IRQCHIP_DECLARE(dm814x_intc, "ti,dm814-intc", intc_of_init);
diff --git a/drivers/irqchip/irq-sunxi-nmi.c b/drivers/irqchip/irq-sunxi-nmi.c
index a412b5d5d0fa..015056bf4004 100644
--- a/drivers/irqchip/irq-sunxi-nmi.c
+++ b/drivers/irqchip/irq-sunxi-nmi.c
@@ -115,8 +115,9 @@ static int sunxi_sc_nmi_set_type(struct irq_data *data, unsigned int flow_type)
 	u32 ctrl_off = ct->regs.type;
 	unsigned int src_type;
 	unsigned int i;
+	unsigned long flags;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 
 	switch (flow_type & IRQF_TRIGGER_MASK) {
 	case IRQ_TYPE_EDGE_FALLING:
@@ -133,7 +134,7 @@ static int sunxi_sc_nmi_set_type(struct irq_data *data, unsigned int flow_type)
 		src_type = SUNXI_SRC_TYPE_LEVEL_LOW;
 		break;
 	default:
-		irq_gc_unlock(gc);
+		irq_gc_unlock(gc, flags);
 		pr_err("Cannot assign multiple trigger modes to IRQ %d.\n",
 			data->irq);
 		return -EBADR;
@@ -151,7 +152,7 @@ static int sunxi_sc_nmi_set_type(struct irq_data *data, unsigned int flow_type)
 	src_type_reg |= src_type;
 	sunxi_sc_nmi_write(gc, ctrl_off, src_type_reg);
 
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 
 	return IRQ_SET_MASK_OK;
 }
@@ -200,7 +201,7 @@ static int __init sunxi_sc_nmi_irq_init(struct device_node *node,
 	gc->chip_types[0].chip.irq_unmask	= irq_gc_mask_set_bit;
 	gc->chip_types[0].chip.irq_eoi		= irq_gc_ack_set_bit;
 	gc->chip_types[0].chip.irq_set_type	= sunxi_sc_nmi_set_type;
-	gc->chip_types[0].chip.flags		= IRQCHIP_EOI_THREADED | IRQCHIP_EOI_IF_HANDLED;
+	gc->chip_types[0].chip.flags		= IRQCHIP_EOI_THREADED | IRQCHIP_EOI_IF_HANDLED | IRQCHIP_PIPELINE_SAFE;
 	gc->chip_types[0].regs.ack		= reg_offs->pend;
 	gc->chip_types[0].regs.mask		= reg_offs->enable;
 	gc->chip_types[0].regs.type		= reg_offs->ctrl;
@@ -211,6 +212,7 @@ static int __init sunxi_sc_nmi_irq_init(struct device_node *node,
 	gc->chip_types[1].chip.irq_mask		= irq_gc_mask_clr_bit;
 	gc->chip_types[1].chip.irq_unmask	= irq_gc_mask_set_bit;
 	gc->chip_types[1].chip.irq_set_type	= sunxi_sc_nmi_set_type;
+	gc->chip_types[1].chip.flags		= IRQCHIP_PIPELINE_SAFE;
 	gc->chip_types[1].regs.ack		= reg_offs->pend;
 	gc->chip_types[1].regs.mask		= reg_offs->enable;
 	gc->chip_types[1].regs.type		= reg_offs->ctrl;
diff --git a/drivers/irqchip/irq-versatile-fpga.c b/drivers/irqchip/irq-versatile-fpga.c
index f1386733d3bc..78d3e441fb5d 100644
--- a/drivers/irqchip/irq-versatile-fpga.c
+++ b/drivers/irqchip/irq-versatile-fpga.c
@@ -85,7 +85,7 @@ static void fpga_irq_handle(struct irq_desc *desc)
 		unsigned int irq = ffs(status) - 1;
 
 		status &= ~(1 << irq);
-		generic_handle_irq(irq_find_mapping(f->domain, irq));
+		ipipe_handle_demuxed_irq(irq_find_mapping(f->domain, irq));
 	} while (status);
 
 out:
@@ -105,7 +105,7 @@ static int handle_one_fpga(struct fpga_irq_data *f, struct pt_regs *regs)
 
 	while ((status  = readl(f->base + IRQ_STATUS))) {
 		irq = ffs(status) - 1;
-		handle_domain_irq(f->domain, irq, regs);
+		ipipe_handle_domain_irq(f->domain, irq, regs);
 		handled = 1;
 	}
 
@@ -161,7 +161,11 @@ void __init fpga_irq_init(void __iomem *base, const char *name, int irq_start,
 	f->chip.name = name;
 	f->chip.irq_ack = fpga_irq_mask;
 	f->chip.irq_mask = fpga_irq_mask;
+#ifdef CONFIG_IPIPE
+	f->chip.irq_mask_ack = fpga_irq_mask;
+#endif
 	f->chip.irq_unmask = fpga_irq_unmask;
+	f->chip.flags = IRQCHIP_PIPELINE_SAFE;
 	f->valid = valid;
 
 	if (parent_irq != -1) {
diff --git a/drivers/irqchip/irq-vic.c b/drivers/irqchip/irq-vic.c
index f3f20a3cff50..9ed65876349f 100644
--- a/drivers/irqchip/irq-vic.c
+++ b/drivers/irqchip/irq-vic.c
@@ -21,6 +21,7 @@
 #include <linux/device.h>
 #include <linux/amba/bus.h>
 #include <linux/irqchip/arm-vic.h>
+#include <linux/ipipe.h>
 
 #include <asm/exception.h>
 #include <asm/irq.h>
@@ -205,7 +206,7 @@ static int handle_one_vic(struct vic_device *vic, struct pt_regs *regs)
 
 	while ((stat = readl_relaxed(vic->base + VIC_IRQ_STATUS))) {
 		irq = ffs(stat) - 1;
-		handle_domain_irq(vic->domain, irq, regs);
+		ipipe_handle_domain_irq(vic->domain, irq, regs);
 		handled = 1;
 	}
 
@@ -222,7 +223,7 @@ static void vic_handle_irq_cascaded(struct irq_desc *desc)
 
 	while ((stat = readl_relaxed(vic->base + VIC_IRQ_STATUS))) {
 		hwirq = ffs(stat) - 1;
-		generic_handle_irq(irq_find_mapping(vic->domain, hwirq));
+		ipipe_handle_demuxed_irq(irq_find_mapping(vic->domain, hwirq));
 	}
 
 	chained_irq_exit(host_chip, desc);
@@ -326,7 +327,7 @@ static void vic_unmask_irq(struct irq_data *d)
 #if defined(CONFIG_PM)
 static struct vic_device *vic_from_irq(unsigned int irq)
 {
-        struct vic_device *v = vic_devices;
+	struct vic_device *v = vic_devices;
 	unsigned int base_irq = irq & ~31;
 	int id;
 
@@ -365,8 +366,12 @@ static struct irq_chip vic_chip = {
 	.name		= "VIC",
 	.irq_ack	= vic_ack_irq,
 	.irq_mask	= vic_mask_irq,
+#ifdef CONFIG_IPIPE
+	.irq_mask_ack   = vic_ack_irq,
+#endif /* CONFIG_IPIPE */
 	.irq_unmask	= vic_unmask_irq,
 	.irq_set_wake	= vic_set_wake,
+	.flags		= IRQCHIP_PIPELINE_SAFE,
 };
 
 static void __init vic_disable(void __iomem *base)
diff --git a/drivers/memory/omap-gpmc.c b/drivers/memory/omap-gpmc.c
index 27bc417029e1..83b366ae05ed 100644
--- a/drivers/memory/omap-gpmc.c
+++ b/drivers/memory/omap-gpmc.c
@@ -1259,12 +1259,15 @@ int gpmc_get_client_irq(unsigned irq_config)
 
 static int gpmc_irq_endis(unsigned long hwirq, bool endis)
 {
+	unsigned long flags;
 	u32 regval;
 
 	/* bits GPMC_NR_NAND_IRQS to 8 are reserved */
 	if (hwirq >= GPMC_NR_NAND_IRQS)
 		hwirq += 8 - GPMC_NR_NAND_IRQS;
 
+	flags = hard_local_irq_save();
+
 	regval = gpmc_read_reg(GPMC_IRQENABLE);
 	if (endis)
 		regval |= BIT(hwirq);
@@ -1272,6 +1275,8 @@ static int gpmc_irq_endis(unsigned long hwirq, bool endis)
 		regval &= ~BIT(hwirq);
 	gpmc_write_reg(GPMC_IRQENABLE, regval);
 
+	hard_local_irq_restore(flags);
+
 	return 0;
 }
 
@@ -1297,6 +1302,7 @@ static void gpmc_irq_unmask(struct irq_data *d)
 
 static void gpmc_irq_edge_config(unsigned long hwirq, bool rising_edge)
 {
+	unsigned long flags;
 	u32 regval;
 
 	/* NAND IRQs polarity is not configurable */
@@ -1306,6 +1312,8 @@ static void gpmc_irq_edge_config(unsigned long hwirq, bool rising_edge)
 	/* WAITPIN starts at BIT 8 */
 	hwirq += 8 - GPMC_NR_NAND_IRQS;
 
+	flags = hard_local_irq_save();
+
 	regval = gpmc_read_reg(GPMC_CONFIG);
 	if (rising_edge)
 		regval &= ~BIT(hwirq);
@@ -1313,6 +1321,8 @@ static void gpmc_irq_edge_config(unsigned long hwirq, bool rising_edge)
 		regval |= BIT(hwirq);
 
 	gpmc_write_reg(GPMC_CONFIG, regval);
+
+	hard_local_irq_restore(flags);
 }
 
 static void gpmc_irq_ack(struct irq_data *d)
@@ -1392,7 +1402,7 @@ static irqreturn_t gpmc_handle_irq(int irq, void *data)
 					 hwirq, virq);
 			}
 
-			generic_handle_irq(virq);
+			ipipe_handle_demuxed_irq(virq);
 		}
 	}
 
@@ -1420,6 +1430,7 @@ static int gpmc_setup_irq(struct gpmc_device *gpmc)
 	gpmc->irq_chip.irq_mask = gpmc_irq_mask;
 	gpmc->irq_chip.irq_unmask = gpmc_irq_unmask;
 	gpmc->irq_chip.irq_set_type = gpmc_irq_set_type;
+	gpmc->irq_chip.flags |= IRQCHIP_PIPELINE_SAFE;
 
 	gpmc_irq_domain = irq_domain_add_linear(gpmc->dev->of_node,
 						gpmc->nirqs,
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index fbcb211cceb4..c9fd4e4966ba 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -66,6 +66,7 @@ static struct irq_chip dw_pcie_msi_irq_chip = {
 	.irq_ack = dw_msi_ack_irq,
 	.irq_mask = dw_msi_mask_irq,
 	.irq_unmask = dw_msi_unmask_irq,
+	.flags = IRQCHIP_PIPELINE_SAFE,
 };
 
 static struct msi_domain_info dw_pcie_msi_domain_info = {
diff --git a/drivers/pci/controller/pcie-altera.c b/drivers/pci/controller/pcie-altera.c
index d2497ca43828..9c5b64a59973 100644
--- a/drivers/pci/controller/pcie-altera.c
+++ b/drivers/pci/controller/pcie-altera.c
@@ -661,7 +661,7 @@ static void altera_pcie_isr(struct irq_desc *desc)
 
 			virq = irq_find_mapping(pcie->irq_domain, bit);
 			if (virq)
-				generic_handle_irq(virq);
+				ipipe_handle_demuxed_irq(virq);
 			else
 				dev_err(dev, "unexpected IRQ, INT%d\n", bit);
 		}
diff --git a/drivers/pinctrl/bcm/pinctrl-bcm2835.c b/drivers/pinctrl/bcm/pinctrl-bcm2835.c
index 0de1a3a96984..3f2617d25a94 100644
--- a/drivers/pinctrl/bcm/pinctrl-bcm2835.c
+++ b/drivers/pinctrl/bcm/pinctrl-bcm2835.c
@@ -18,6 +18,7 @@
 #include <linux/io.h>
 #include <linux/irq.h>
 #include <linux/irqdesc.h>
+#include <linux/ipipe.h>
 #include <linux/init.h>
 #include <linux/of_address.h>
 #include <linux/of.h>
@@ -87,7 +88,11 @@ struct bcm2835_pinctrl {
 	struct gpio_chip gpio_chip;
 	struct pinctrl_gpio_range gpio_range;
 
+#ifdef CONFIG_IPIPE
+	ipipe_spinlock_t irq_lock[BCM2835_NUM_BANKS];
+#else
 	raw_spinlock_t irq_lock[BCM2835_NUM_BANKS];
+#endif
 };
 
 /* pins are just named GPIO0..GPIO53 */
@@ -367,7 +372,7 @@ static void bcm2835_gpio_irq_handle_bank(struct bcm2835_pinctrl *pc,
 	events &= pc->enabled_irq_map[bank];
 	for_each_set_bit(offset, &events, 32) {
 		gpio = (32 * bank) + offset;
-		generic_handle_irq(irq_linear_revmap(pc->gpio_chip.irq.domain,
+		ipipe_handle_demuxed_irq(irq_linear_revmap(pc->gpio_chip.irq.domain,
 						     gpio));
 	}
 }
@@ -462,6 +467,7 @@ static void bcm2835_gpio_irq_enable(struct irq_data *data)
 	raw_spin_lock_irqsave(&pc->irq_lock[bank], flags);
 	set_bit(offset, &pc->enabled_irq_map[bank]);
 	bcm2835_gpio_irq_config(pc, gpio, true);
+	ipipe_unlock_irq(data->irq);
 	raw_spin_unlock_irqrestore(&pc->irq_lock[bank], flags);
 }
 
@@ -479,6 +485,7 @@ static void bcm2835_gpio_irq_disable(struct irq_data *data)
 	/* Clear events that were latched prior to clearing event sources */
 	bcm2835_gpio_set_bit(pc, GPEDS0, gpio);
 	clear_bit(offset, &pc->enabled_irq_map[bank]);
+	ipipe_lock_irq(data->irq);
 	raw_spin_unlock_irqrestore(&pc->irq_lock[bank], flags);
 }
 
@@ -608,6 +615,39 @@ static void bcm2835_gpio_irq_ack(struct irq_data *data)
 	bcm2835_gpio_set_bit(pc, GPEDS0, gpio);
 }
 
+#ifdef CONFIG_IPIPE
+
+static void bcm2835_gpio_irq_hold(struct irq_data *data)
+{
+	struct bcm2835_pinctrl *pc = irq_data_get_irq_chip_data(data);
+	unsigned gpio = irqd_to_hwirq(data);
+	unsigned offset = GPIO_REG_SHIFT(gpio);
+	unsigned bank = GPIO_REG_OFFSET(gpio);
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&pc->irq_lock[bank], flags);
+	bcm2835_gpio_irq_config(pc, gpio, false);
+	bcm2835_gpio_set_bit(pc, GPEDS0, gpio);
+	clear_bit(offset, &pc->enabled_irq_map[bank]);
+	raw_spin_unlock_irqrestore(&pc->irq_lock[bank], flags);
+}
+
+static void bcm2835_gpio_irq_release(struct irq_data *data)
+{
+	struct bcm2835_pinctrl *pc = irq_data_get_irq_chip_data(data);
+	unsigned gpio = irqd_to_hwirq(data);
+	unsigned offset = GPIO_REG_SHIFT(gpio);
+	unsigned bank = GPIO_REG_OFFSET(gpio);
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&pc->irq_lock[bank], flags);
+	set_bit(offset, &pc->enabled_irq_map[bank]);
+	bcm2835_gpio_irq_config(pc, gpio, true);
+	raw_spin_unlock_irqrestore(&pc->irq_lock[bank], flags);
+}
+
+#endif
+
 static struct irq_chip bcm2835_gpio_irq_chip = {
 	.name = MODULE_NAME,
 	.irq_enable = bcm2835_gpio_irq_enable,
@@ -616,6 +656,11 @@ static struct irq_chip bcm2835_gpio_irq_chip = {
 	.irq_ack = bcm2835_gpio_irq_ack,
 	.irq_mask = bcm2835_gpio_irq_disable,
 	.irq_unmask = bcm2835_gpio_irq_enable,
+#ifdef CONFIG_IPIPE
+	.irq_hold = bcm2835_gpio_irq_hold,
+	.irq_release = bcm2835_gpio_irq_release,
+#endif
+	.flags = IRQCHIP_PIPELINE_SAFE,
 };
 
 static int bcm2835_pctl_get_groups_count(struct pinctrl_dev *pctldev)
diff --git a/drivers/pinctrl/intel/pinctrl-intel.c b/drivers/pinctrl/intel/pinctrl-intel.c
index 83981ad66a71..6bbaa05d16c1 100644
--- a/drivers/pinctrl/intel/pinctrl-intel.c
+++ b/drivers/pinctrl/intel/pinctrl-intel.c
@@ -1135,7 +1135,9 @@ static irqreturn_t intel_gpio_community_irq_handler(struct intel_pinctrl *pctrl,
 
 			irq = irq_find_mapping(gc->irq.domain,
 					       padgrp->gpio_base + gpp_offset);
-			generic_handle_irq(irq);
+			hard_cond_local_irq_disable();
+			ipipe_handle_demuxed_irq(irq);
+			hard_cond_local_irq_enable();
 
 			ret |= IRQ_HANDLED;
 		}
@@ -1223,7 +1225,7 @@ static int intel_gpio_probe(struct intel_pinctrl *pctrl, int irq)
 	pctrl->irqchip.irq_unmask = intel_gpio_irq_unmask;
 	pctrl->irqchip.irq_set_type = intel_gpio_irq_type;
 	pctrl->irqchip.irq_set_wake = intel_gpio_irq_wake;
-	pctrl->irqchip.flags = IRQCHIP_MASK_ON_SUSPEND;
+	pctrl->irqchip.flags = IRQCHIP_MASK_ON_SUSPEND | IRQCHIP_PIPELINE_SAFE;
 
 	ret = devm_gpiochip_add_data(pctrl->dev, &pctrl->chip, pctrl);
 	if (ret) {
diff --git a/drivers/pinctrl/pinctrl-rockchip.c b/drivers/pinctrl/pinctrl-rockchip.c
index 1bd8840e11a6..b0a055687d8c 100644
--- a/drivers/pinctrl/pinctrl-rockchip.c
+++ b/drivers/pinctrl/pinctrl-rockchip.c
@@ -2902,7 +2902,7 @@ static int rockchip_irq_set_type(struct irq_data *d, unsigned int type)
 	u32 polarity;
 	u32 level;
 	u32 data;
-	unsigned long flags;
+	unsigned long flags, flags2;
 	int ret;
 
 	/* make sure the pin is configured as gpio input */
@@ -2925,7 +2925,7 @@ static int rockchip_irq_set_type(struct irq_data *d, unsigned int type)
 		irq_set_handler_locked(d, handle_level_irq);
 
 	raw_spin_lock_irqsave(&bank->slock, flags);
-	irq_gc_lock(gc);
+	flags2 = irq_gc_lock(gc);
 
 	level = readl_relaxed(gc->reg_base + GPIO_INTTYPE_LEVEL);
 	polarity = readl_relaxed(gc->reg_base + GPIO_INT_POLARITY);
@@ -2966,7 +2966,7 @@ static int rockchip_irq_set_type(struct irq_data *d, unsigned int type)
 		polarity &= ~mask;
 		break;
 	default:
-		irq_gc_unlock(gc);
+		irq_gc_unlock(gc, flags2);
 		raw_spin_unlock_irqrestore(&bank->slock, flags);
 		clk_disable(bank->clk);
 		return -EINVAL;
@@ -2975,7 +2975,7 @@ static int rockchip_irq_set_type(struct irq_data *d, unsigned int type)
 	writel_relaxed(level, gc->reg_base + GPIO_INTTYPE_LEVEL);
 	writel_relaxed(polarity, gc->reg_base + GPIO_INT_POLARITY);
 
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags2);
 	raw_spin_unlock_irqrestore(&bank->slock, flags);
 	clk_disable(bank->clk);
 
diff --git a/drivers/pinctrl/pinctrl-single.c b/drivers/pinctrl/pinctrl-single.c
index a9d511982780..9de02349701b 100644
--- a/drivers/pinctrl/pinctrl-single.c
+++ b/drivers/pinctrl/pinctrl-single.c
@@ -16,6 +16,7 @@
 #include <linux/err.h>
 #include <linux/list.h>
 #include <linux/interrupt.h>
+#include <linux/ipipe.h>
 
 #include <linux/irqchip/chained_irq.h>
 
@@ -185,7 +186,11 @@ struct pcs_device {
 #define PCS_FEAT_PINCONF	(1 << 0)
 	struct property *missing_nr_pinctrl_cells;
 	struct pcs_soc_data socdata;
+#ifdef CONFIG_IPIPE
+	ipipe_spinlock_t lock;
+#else /* !IPIPE */
 	raw_spinlock_t lock;
+#endif /* !IPIPE */
 	struct mutex mutex;
 	unsigned width;
 	unsigned fmask;
@@ -1463,7 +1468,7 @@ static int pcs_irq_handle(struct pcs_soc_data *pcs_soc)
 		mask = pcs->read(pcswi->reg);
 		raw_spin_unlock(&pcs->lock);
 		if (mask & pcs_soc->irq_status_mask) {
-			generic_handle_irq(irq_find_mapping(pcs->domain,
+			ipipe_handle_demuxed_irq(irq_find_mapping(pcs->domain,
 							    pcswi->hwirq));
 			count++;
 		}
@@ -1483,8 +1488,14 @@ static int pcs_irq_handle(struct pcs_soc_data *pcs_soc)
 static irqreturn_t pcs_irq_handler(int irq, void *d)
 {
 	struct pcs_soc_data *pcs_soc = d;
+	unsigned long flags;
+	irqreturn_t ret;
 
-	return pcs_irq_handle(pcs_soc) ? IRQ_HANDLED : IRQ_NONE;
+	flags = hard_cond_local_irq_save();
+	ret = pcs_irq_handle(pcs_soc) ? IRQ_HANDLED : IRQ_NONE;
+	hard_cond_local_irq_restore(flags);
+
+	return ret;
 }
 
 /**
diff --git a/drivers/pinctrl/sunxi/pinctrl-sunxi.c b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
index 0cbca30b75dc..2874577f9fee 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sunxi.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
@@ -15,6 +15,7 @@
 #include <linux/gpio/driver.h>
 #include <linux/irqdomain.h>
 #include <linux/irqchip/chained_irq.h>
+#include <linux/ipipe.h>
 #include <linux/export.h>
 #include <linux/of.h>
 #include <linux/of_clk.h>
@@ -1066,14 +1067,33 @@ static struct irq_chip sunxi_pinctrl_edge_irq_chip = {
 	.irq_request_resources = sunxi_pinctrl_irq_request_resources,
 	.irq_release_resources = sunxi_pinctrl_irq_release_resources,
 	.irq_set_type	= sunxi_pinctrl_irq_set_type,
-	.flags		= IRQCHIP_SKIP_SET_WAKE,
+	.flags		= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_PIPELINE_SAFE,
 };
 
+#ifdef CONFIG_IPIPE
+
+static void sunxi_pinctrl_irq_hold(struct irq_data *d)
+{
+	sunxi_pinctrl_irq_mask(d);
+	sunxi_pinctrl_irq_ack(d);
+}
+
+static void sunxi_pinctrl_irq_release(struct irq_data *d)
+{
+	sunxi_pinctrl_irq_unmask(d);
+}
+
+#endif
+
 static struct irq_chip sunxi_pinctrl_level_irq_chip = {
 	.name		= "sunxi_pio_level",
 	.irq_eoi	= sunxi_pinctrl_irq_ack,
 	.irq_mask	= sunxi_pinctrl_irq_mask,
 	.irq_unmask	= sunxi_pinctrl_irq_unmask,
+#ifdef CONFIG_IPIPE
+	.irq_hold	= sunxi_pinctrl_irq_hold,
+	.irq_release	= sunxi_pinctrl_irq_release,
+#endif
 	/* Define irq_enable / disable to avoid spurious irqs for drivers
 	 * using these to suppress irqs while they clear the irq source */
 	.irq_enable	= sunxi_pinctrl_irq_ack_unmask,
@@ -1082,7 +1102,7 @@ static struct irq_chip sunxi_pinctrl_level_irq_chip = {
 	.irq_release_resources = sunxi_pinctrl_irq_release_resources,
 	.irq_set_type	= sunxi_pinctrl_irq_set_type,
 	.flags		= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_EOI_THREADED |
-			  IRQCHIP_EOI_IF_HANDLED,
+			  IRQCHIP_EOI_IF_HANDLED | IRQCHIP_PIPELINE_SAFE,
 };
 
 static int sunxi_pinctrl_irq_of_xlate(struct irq_domain *d,
@@ -1140,7 +1160,7 @@ static void sunxi_pinctrl_irq_handler(struct irq_desc *desc)
 		for_each_set_bit(irqoffset, &val, IRQ_PER_BANK) {
 			int pin_irq = irq_find_mapping(pctl->domain,
 						       bank * IRQ_PER_BANK + irqoffset);
-			generic_handle_irq(pin_irq);
+			ipipe_handle_demuxed_irq(pin_irq);
 		}
 		chained_irq_exit(chip, desc);
 	}
diff --git a/drivers/pinctrl/sunxi/pinctrl-sunxi.h b/drivers/pinctrl/sunxi/pinctrl-sunxi.h
index a32bb5bcb754..f1139039d2b0 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sunxi.h
+++ b/drivers/pinctrl/sunxi/pinctrl-sunxi.h
@@ -167,7 +167,11 @@ struct sunxi_pinctrl {
 	unsigned			ngroups;
 	int				*irq;
 	unsigned			*irq_array;
+#ifdef CONFIG_IPIPE
+	ipipe_spinlock_t		lock;
+#else
 	raw_spinlock_t			lock;
+#endif
 	struct pinctrl_dev		*pctl_dev;
 	unsigned long			variant;
 };
diff --git a/drivers/soc/dove/pmu.c b/drivers/soc/dove/pmu.c
index ffc5311c0ed8..ad379f0f65e8 100644
--- a/drivers/soc/dove/pmu.c
+++ b/drivers/soc/dove/pmu.c
@@ -16,6 +16,7 @@
 #include <linux/slab.h>
 #include <linux/soc/dove/pmu.h>
 #include <linux/spinlock.h>
+#include <linux/ipipe.h>
 
 #define NR_PMU_IRQS		7
 
@@ -231,6 +232,7 @@ static void pmu_irq_handler(struct irq_desc *desc)
 	void __iomem *base = gc->reg_base;
 	u32 stat = readl_relaxed(base + PMC_IRQ_CAUSE) & gc->mask_cache;
 	u32 done = ~0;
+	unsigned long flags;
 
 	if (stat == 0) {
 		handle_bad_irq(desc);
@@ -243,7 +245,7 @@ static void pmu_irq_handler(struct irq_desc *desc)
 		stat &= ~(1 << hwirq);
 		done &= ~(1 << hwirq);
 
-		generic_handle_irq(irq_find_mapping(domain, hwirq));
+		ipipe_handle_demuxed_irq(irq_find_mapping(domain, hwirq));
 	}
 
 	/*
@@ -257,10 +259,10 @@ static void pmu_irq_handler(struct irq_desc *desc)
 	 * So, let's structure the code so that the window is as small as
 	 * possible.
 	 */
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	done &= readl_relaxed(base + PMC_IRQ_CAUSE);
 	writel_relaxed(done, base + PMC_IRQ_CAUSE);
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 
 static int __init dove_init_pmu_irq(struct pmu_data *pmu, int irq)
@@ -296,6 +298,7 @@ static int __init dove_init_pmu_irq(struct pmu_data *pmu, int irq)
 	gc->chip_types[0].regs.mask = PMC_IRQ_MASK;
 	gc->chip_types[0].chip.irq_mask = irq_gc_mask_clr_bit;
 	gc->chip_types[0].chip.irq_unmask = irq_gc_mask_set_bit;
+	gc->chip_types[0].chip.flags |= IRQCHIP_PIPELINE_SAFE;
 
 	pmu->irq_domain = domain;
 	pmu->irq_gc = gc;
diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
index 2675771a03a0..cf293151b4e6 100644
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -586,6 +586,48 @@ static void univ8250_console_write(struct console *co, const char *s,
 	serial8250_console_write(up, s, count);
 }
 
+#ifdef CONFIG_RAW_PRINTK
+
+static void raw_write_char(struct uart_8250_port *up, int c)
+{
+	unsigned int status, tmout = 10000;
+
+	for (;;) {
+		status = serial_in(up, UART_LSR);
+		up->lsr_saved_flags |= status & LSR_SAVE_FLAGS;
+		if ((status & UART_LSR_THRE) == UART_LSR_THRE)
+			break;
+		if (--tmout == 0)
+			break;
+		cpu_relax();
+	}
+	serial_port_out(&up->port, UART_TX, c);
+}
+
+static void univ8250_console_write_raw(struct console *co, const char *s,
+				       unsigned int count)
+{
+	struct uart_8250_port *up = &serial8250_ports[co->index];
+	unsigned int ier;
+
+        ier = serial_in(up, UART_IER);
+
+        if (up->capabilities & UART_CAP_UUE)
+                serial_out(up, UART_IER, UART_IER_UUE);
+        else
+                serial_out(up, UART_IER, 0);
+
+	while (count-- > 0) {
+		if (*s == '\n')
+			raw_write_char(up, '\r');
+		raw_write_char(up, *s++);
+	}
+
+        serial_out(up, UART_IER, ier);
+}
+
+#endif
+
 static int univ8250_console_setup(struct console *co, char *options)
 {
 	struct uart_port *port;
@@ -667,7 +709,12 @@ static struct console univ8250_console = {
 	.device		= uart_console_device,
 	.setup		= univ8250_console_setup,
 	.match		= univ8250_console_match,
+#ifdef CONFIG_RAW_PRINTK
+	.write_raw	= univ8250_console_write_raw,
+	.flags		= CON_PRINTBUFFER |  CON_ANYTIME | CON_RAW,
+#else
 	.flags		= CON_PRINTBUFFER | CON_ANYTIME,
+#endif
 	.index		= -1,
 	.data		= &serial8250_reg,
 };
diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index 16720c97a4dd..15a288646fc4 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -2207,6 +2207,42 @@ static void pl011_console_putchar(struct uart_port *port, int ch)
 	pl011_write(ch, uap, REG_DR);
 }
 
+#ifdef CONFIG_RAW_PRINTK
+
+#define pl011_clk_setup(clk)	clk_prepare_enable(clk)
+#define pl011_clk_enable(clk)	do { } while (0)
+#define pl011_clk_disable(clk)	do { } while (0)
+
+static void
+pl011_console_write_raw(struct console *co, const char *s, unsigned int count)
+{
+	struct uart_amba_port *uap = amba_ports[co->index];
+	unsigned int old_cr, new_cr, status;
+
+	old_cr = readw(uap->port.membase + UART011_CR);
+	new_cr = old_cr & ~UART011_CR_CTSEN;
+	new_cr |= UART01x_CR_UARTEN | UART011_CR_TXE;
+	writew(new_cr, uap->port.membase + UART011_CR);
+
+	while (count-- > 0) {
+		if (*s == '\n')
+			pl011_console_putchar(&uap->port, '\r');
+		pl011_console_putchar(&uap->port, *s++);
+	}
+	do
+		status = readw(uap->port.membase + UART01x_FR);
+	while (status & UART01x_FR_BUSY);
+	writew(old_cr, uap->port.membase + UART011_CR);
+}
+
+#else  /* !CONFIG_RAW_PRINTK */
+
+#define pl011_clk_setup(clk)	clk_prepare(clk)
+#define pl011_clk_enable(clk)	clk_enable(clk)
+#define pl011_clk_disable(clk)	clk_disable(clk)
+
+#endif  /* !CONFIG_RAW_PRINTK */
+
 static void
 pl011_console_write(struct console *co, const char *s, unsigned int count)
 {
@@ -2215,7 +2251,7 @@ pl011_console_write(struct console *co, const char *s, unsigned int count)
 	unsigned long flags;
 	int locked = 1;
 
-	clk_enable(uap->clk);
+	pl011_clk_enable(uap->clk);
 
 	local_irq_save(flags);
 	if (uap->port.sysrq)
@@ -2252,7 +2288,7 @@ pl011_console_write(struct console *co, const char *s, unsigned int count)
 		spin_unlock(&uap->port.lock);
 	local_irq_restore(flags);
 
-	clk_disable(uap->clk);
+	pl011_clk_disable(uap->clk);
 }
 
 static void pl011_console_get_options(struct uart_amba_port *uap, int *baud,
@@ -2312,7 +2348,7 @@ static int pl011_console_setup(struct console *co, char *options)
 	/* Allow pins to be muxed in and configured */
 	pinctrl_pm_select_default_state(uap->port.dev);
 
-	ret = clk_prepare(uap->clk);
+	ret = pl011_clk_setup(uap->clk);
 	if (ret)
 		return ret;
 
@@ -2406,7 +2442,12 @@ static struct console amba_console = {
 	.device		= uart_console_device,
 	.setup		= pl011_console_setup,
 	.match		= pl011_console_match,
+#ifdef CONFIG_RAW_PRINTK
+	.write_raw	= pl011_console_write_raw,
+	.flags		= CON_PRINTBUFFER | CON_RAW | CON_ANYTIME,
+#else
 	.flags		= CON_PRINTBUFFER | CON_ANYTIME,
+#endif
 	.index		= -1,
 	.data		= &amba_reg,
 };
diff --git a/drivers/tty/serial/xilinx_uartps.c b/drivers/tty/serial/xilinx_uartps.c
index 9359c80fbb9f..298405deefc5 100644
--- a/drivers/tty/serial/xilinx_uartps.c
+++ b/drivers/tty/serial/xilinx_uartps.c
@@ -1233,6 +1233,34 @@ static void cdns_uart_console_write(struct console *co, const char *s,
 		spin_unlock_irqrestore(&port->lock, flags);
 }
 
+#ifdef CONFIG_RAW_PRINTK
+
+static void cdns_uart_console_write_raw(struct console *co, const char *s,
+					unsigned int count)
+{
+	struct uart_port *port = &cdns_uart_port[co->index];
+	unsigned int imr, ctrl;
+
+	imr = readl(port->membase + CDNS_UART_IMR);
+	writel(imr, port->membase + CDNS_UART_IDR);
+
+	ctrl = readl(port->membase + CDNS_UART_CR);
+	ctrl &= ~CDNS_UART_CR_TX_DIS;
+	ctrl |= CDNS_UART_CR_TX_EN;
+	writel(ctrl, port->membase + CDNS_UART_CR);
+
+	while (count-- > 0) {
+		if (*s == '\n')
+			writel('\r', port->membase + CDNS_UART_FIFO);
+		writel(*s++, port->membase + CDNS_UART_FIFO);
+	}
+
+	writel(ctrl, port->membase + CDNS_UART_CR);
+	writel(imr, port->membase + CDNS_UART_IER);
+}
+
+#endif
+
 /**
  * cdns_uart_console_setup - Initialize the uart to default config
  * @co: Console handle
@@ -1274,7 +1302,12 @@ static struct console cdns_uart_console = {
 	.write	= cdns_uart_console_write,
 	.device	= uart_console_device,
 	.setup	= cdns_uart_console_setup,
+#ifdef CONFIG_RAW_PRINTK
+	.write_raw = cdns_uart_console_write_raw,
+	.flags	= CON_PRINTBUFFER | CON_RAW,
+#else
 	.flags	= CON_PRINTBUFFER,
+#endif
 	.index	= -1, /* Specified on the cmdline (e.g. console=ttyPS ) */
 	.data	= &cdns_uart_uart_driver,
 };
diff --git a/fs/exec.c b/fs/exec.c
index 2441eb1a1e2d..a7996f20805c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -50,6 +50,7 @@
 #include <linux/module.h>
 #include <linux/namei.h>
 #include <linux/mount.h>
+#include <linux/ipipe.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
 #include <linux/tsacct_kern.h>
@@ -1016,6 +1017,7 @@ static int exec_mmap(struct mm_struct *mm)
 {
 	struct task_struct *tsk;
 	struct mm_struct *old_mm, *active_mm;
+	unsigned long flags;
 	int ret;
 
 	/* Notify parent that we're no longer interested in the old VM */
@@ -1047,6 +1049,7 @@ static int exec_mmap(struct mm_struct *mm)
 	membarrier_exec_mmap(mm);
 
 	local_irq_disable();
+	ipipe_mm_switch_protect(flags);
 	active_mm = tsk->active_mm;
 	tsk->active_mm = mm;
 	tsk->mm = mm;
@@ -1057,9 +1060,12 @@ static int exec_mmap(struct mm_struct *mm)
 	 * switches. Not all architectures can handle irqs off over
 	 * activate_mm yet.
 	 */
-	if (!IS_ENABLED(CONFIG_ARCH_WANT_IRQS_OFF_ACTIVATE_MM))
+	if (!IS_ENABLED(CONFIG_ARCH_WANT_IRQS_OFF_ACTIVATE_MM) &&
+	    (!IS_ENABLED(CONFIG_IPIPE) ||
+	     IS_ENABLED(CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH)))
 		local_irq_enable();
 	activate_mm(active_mm, mm);
+	ipipe_mm_switch_unprotect(flags);
 	if (IS_ENABLED(CONFIG_ARCH_WANT_IRQS_OFF_ACTIVATE_MM))
 		local_irq_enable();
 	tsk->mm->vmacache_seqnum = 0;
diff --git a/include/asm-generic/atomic.h b/include/asm-generic/atomic.h
index 286867f593d2..2e7a4dd9790a 100644
--- a/include/asm-generic/atomic.h
+++ b/include/asm-generic/atomic.h
@@ -76,9 +76,9 @@ static inline void atomic_##op(int i, atomic_t *v)			\
 {									\
 	unsigned long flags;						\
 									\
-	raw_local_irq_save(flags);					\
+	flags = hard_local_irq_save();					\
 	v->counter = v->counter c_op i;					\
-	raw_local_irq_restore(flags);					\
+	hard_local_irq_restore(flags);					\
 }
 
 #define ATOMIC_OP_RETURN(op, c_op)					\
@@ -87,9 +87,9 @@ static inline int atomic_##op##_return(int i, atomic_t *v)		\
 	unsigned long flags;						\
 	int ret;							\
 									\
-	raw_local_irq_save(flags);					\
+	flags = hard_local_irq_save();					\
 	ret = (v->counter = v->counter c_op i);				\
-	raw_local_irq_restore(flags);					\
+	hard_local_irq_restore(flags);					\
 									\
 	return ret;							\
 }
@@ -100,10 +100,10 @@ static inline int atomic_fetch_##op(int i, atomic_t *v)			\
 	unsigned long flags;						\
 	int ret;							\
 									\
-	raw_local_irq_save(flags);					\
+	flags = hard_local_irq_save(flags);				\
 	ret = v->counter;						\
 	v->counter = v->counter c_op i;					\
-	raw_local_irq_restore(flags);					\
+	hard_local_irq_restore(flags);					\
 									\
 	return ret;							\
 }
diff --git a/include/asm-generic/cmpxchg-local.h b/include/asm-generic/cmpxchg-local.h
index f17f14f84d09..e05f37fb7158 100644
--- a/include/asm-generic/cmpxchg-local.h
+++ b/include/asm-generic/cmpxchg-local.h
@@ -4,6 +4,7 @@
 
 #include <linux/types.h>
 #include <linux/irqflags.h>
+#include <asm-generic/ipipe.h>
 
 extern unsigned long wrong_size_cmpxchg(volatile void *ptr)
 	__noreturn;
@@ -23,7 +24,7 @@ static inline unsigned long __cmpxchg_local_generic(volatile void *ptr,
 	if (size == 8 && sizeof(unsigned long) != 8)
 		wrong_size_cmpxchg(ptr);
 
-	raw_local_irq_save(flags);
+	flags = hard_local_irq_save();
 	switch (size) {
 	case 1: prev = *(u8 *)ptr;
 		if (prev == old)
@@ -44,7 +45,7 @@ static inline unsigned long __cmpxchg_local_generic(volatile void *ptr,
 	default:
 		wrong_size_cmpxchg(ptr);
 	}
-	raw_local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 	return prev;
 }
 
@@ -57,11 +58,11 @@ static inline u64 __cmpxchg64_local_generic(volatile void *ptr,
 	u64 prev;
 	unsigned long flags;
 
-	raw_local_irq_save(flags);
+	flags = hard_local_irq_save();
 	prev = *(u64 *)ptr;
 	if (prev == old)
 		*(u64 *)ptr = new;
-	raw_local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 	return prev;
 }
 
diff --git a/include/asm-generic/ipipe.h b/include/asm-generic/ipipe.h
new file mode 100644
index 000000000000..102ffffe4a54
--- /dev/null
+++ b/include/asm-generic/ipipe.h
@@ -0,0 +1,93 @@
+/* -*- linux-c -*-
+ * include/asm-generic/ipipe.h
+ *
+ * Copyright (C) 2002-2017 Philippe Gerum.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ */
+#ifndef __ASM_GENERIC_IPIPE_H
+#define __ASM_GENERIC_IPIPE_H
+
+#ifdef CONFIG_IPIPE
+
+#if defined(CONFIG_DEBUG_ATOMIC_SLEEP) || defined(CONFIG_PROVE_LOCKING) || \
+	defined(CONFIG_PREEMPT_VOLUNTARY) || defined(CONFIG_IPIPE_DEBUG_CONTEXT)
+void __ipipe_uaccess_might_fault(void);
+#else
+#define __ipipe_uaccess_might_fault() might_fault()
+#endif
+
+#define hard_cond_local_irq_enable()		hard_local_irq_enable()
+#define hard_cond_local_irq_disable()		hard_local_irq_disable()
+#define hard_cond_local_irq_save()		hard_local_irq_save()
+#define hard_cond_local_irq_restore(flags)	hard_local_irq_restore(flags)
+
+#ifdef CONFIG_IPIPE_DEBUG_CONTEXT
+void ipipe_root_only(void);
+#else /* !CONFIG_IPIPE_DEBUG_CONTEXT */
+static inline void ipipe_root_only(void) { }
+#endif /* !CONFIG_IPIPE_DEBUG_CONTEXT */
+
+void ipipe_stall_root(void);
+
+void ipipe_unstall_root(void);
+
+unsigned long ipipe_test_and_stall_root(void);
+
+unsigned long ipipe_test_root(void);
+
+void ipipe_restore_root(unsigned long x);
+
+#else  /* !CONFIG_IPIPE */
+
+#define hard_local_irq_save_notrace()		\
+	({					\
+		unsigned long __flags;		\
+		raw_local_irq_save(__flags);	\
+		__flags;			\
+	})
+
+#define hard_local_irq_restore_notrace(__flags)	\
+	raw_local_irq_restore(__flags)
+
+#define hard_local_irq_enable_notrace()	\
+	raw_local_irq_enable()
+
+#define hard_local_irq_disable_notrace()	\
+	raw_local_irq_disable()
+
+#define hard_local_irq_save()			\
+	({					\
+		unsigned long __flags;		\
+		local_irq_save(__flags);	\
+		__flags;			\
+	})
+#define hard_local_irq_restore(__flags)	local_irq_restore(__flags)
+#define hard_local_irq_enable()		local_irq_enable()
+#define hard_local_irq_disable()	local_irq_disable()
+#define hard_irqs_disabled()		irqs_disabled()
+
+#define hard_cond_local_irq_enable()		do { } while(0)
+#define hard_cond_local_irq_disable()		do { } while(0)
+#define hard_cond_local_irq_save()		0
+#define hard_cond_local_irq_restore(__flags)	do { (void)(__flags); } while(0)
+
+#define __ipipe_uaccess_might_fault()		might_fault()
+
+static inline void ipipe_root_only(void) { }
+
+#endif /* !CONFIG_IPIPE */
+
+#if defined(CONFIG_SMP) && defined(CONFIG_IPIPE)
+#define hard_smp_local_irq_save()		hard_local_irq_save()
+#define hard_smp_local_irq_restore(__flags)	hard_local_irq_restore(__flags)
+#else /* !CONFIG_SMP */
+#define hard_smp_local_irq_save()		0
+#define hard_smp_local_irq_restore(__flags)	do { (void)(__flags); } while(0)
+#endif /* CONFIG_SMP */
+
+#endif
diff --git a/include/asm-generic/percpu.h b/include/asm-generic/percpu.h
index c2de013b2cf4..109a4bcd741a 100644
--- a/include/asm-generic/percpu.h
+++ b/include/asm-generic/percpu.h
@@ -5,6 +5,7 @@
 #include <linux/compiler.h>
 #include <linux/threads.h>
 #include <linux/percpu-defs.h>
+#include <asm-generic/ipipe.h>
 
 #ifdef CONFIG_SMP
 
@@ -44,11 +45,29 @@ extern unsigned long __per_cpu_offset[NR_CPUS];
 #define arch_raw_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, __my_cpu_offset)
 #endif
 
+#ifdef CONFIG_IPIPE
+#if defined(CONFIG_IPIPE_DEBUG_INTERNAL) && defined(CONFIG_SMP)
+unsigned long __ipipe_cpu_get_offset(void);
+#define __ipipe_cpu_offset  __ipipe_cpu_get_offset()
+#else
+#define __ipipe_cpu_offset  __my_cpu_offset
+#endif
+#ifndef __ipipe_raw_cpu_ptr
+#define __ipipe_raw_cpu_ptr(ptr)  SHIFT_PERCPU_PTR(ptr, __ipipe_cpu_offset)
+#endif
+#define __ipipe_raw_cpu_read(var) (*__ipipe_raw_cpu_ptr(&(var)))
+#endif /* CONFIG_IPIPE */
+
 #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
 extern void setup_per_cpu_areas(void);
 #endif
 
-#endif	/* SMP */
+#else /* !SMP */
+
+#define __ipipe_raw_cpu_ptr(ptr)  VERIFY_PERCPU_PTR(ptr)
+#define __ipipe_raw_cpu_read(var) (*__ipipe_raw_cpu_ptr(&(var)))
+
+#endif	/* !SMP */
 
 #ifndef PER_CPU_BASE_SECTION
 #ifdef CONFIG_SMP
@@ -144,9 +163,9 @@ do {									\
 #define this_cpu_generic_to_op(pcp, val, op)				\
 do {									\
 	unsigned long __flags;						\
-	raw_local_irq_save(__flags);					\
+	__flags = hard_local_irq_save();				\
 	raw_cpu_generic_to_op(pcp, val, op);				\
-	raw_local_irq_restore(__flags);					\
+	hard_local_irq_restore(__flags);				\
 } while (0)
 
 
@@ -154,9 +173,9 @@ do {									\
 ({									\
 	typeof(pcp) __ret;						\
 	unsigned long __flags;						\
-	raw_local_irq_save(__flags);					\
+	__flags = hard_local_irq_save();				\
 	__ret = raw_cpu_generic_add_return(pcp, val);			\
-	raw_local_irq_restore(__flags);					\
+	hard_local_irq_restore(__flags);				\
 	__ret;								\
 })
 
@@ -164,9 +183,9 @@ do {									\
 ({									\
 	typeof(pcp) __ret;						\
 	unsigned long __flags;						\
-	raw_local_irq_save(__flags);					\
+	__flags = hard_local_irq_save();				\
 	__ret = raw_cpu_generic_xchg(pcp, nval);			\
-	raw_local_irq_restore(__flags);					\
+	hard_local_irq_restore(__flags);				\
 	__ret;								\
 })
 
@@ -174,9 +193,9 @@ do {									\
 ({									\
 	typeof(pcp) __ret;						\
 	unsigned long __flags;						\
-	raw_local_irq_save(__flags);					\
+	__flags = hard_local_irq_save();				\
 	__ret = raw_cpu_generic_cmpxchg(pcp, oval, nval);		\
-	raw_local_irq_restore(__flags);					\
+	hard_local_irq_restore(__flags);				\
 	__ret;								\
 })
 
@@ -184,10 +203,10 @@ do {									\
 ({									\
 	int __ret;							\
 	unsigned long __flags;						\
-	raw_local_irq_save(__flags);					\
+	__flags = hard_local_irq_save();				\
 	__ret = raw_cpu_generic_cmpxchg_double(pcp1, pcp2,		\
 			oval1, oval2, nval1, nval2);			\
-	raw_local_irq_restore(__flags);					\
+	hard_local_irq_restore(__flags);				\
 	__ret;								\
 })
 
diff --git a/include/asm-generic/switch_to.h b/include/asm-generic/switch_to.h
index 5897d100a6e6..600fcb9f6cd9 100644
--- a/include/asm-generic/switch_to.h
+++ b/include/asm-generic/switch_to.h
@@ -17,10 +17,17 @@
  */
 extern struct task_struct *__switch_to(struct task_struct *,
 				       struct task_struct *);
-
+#ifdef CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH
 #define switch_to(prev, next, last)					\
 	do {								\
+		hard_cond_local_irq_disable();                                  \
 		((last) = __switch_to((prev), (next)));			\
+		hard_cond_local_irq_enable();                                   \
 	} while (0)
-
+#else /* !CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH */
+#define switch_to(prev, next, last)					\
+	do {								\
+		((last) = __switch_to((prev), (next)));			\
+	} while (0)
+#endif /* !CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH */
 #endif /* __ASM_GENERIC_SWITCH_TO_H */
diff --git a/include/clocksource/timer-sp804.h b/include/clocksource/timer-sp804.h
index a5b41f31a1c2..461da5cc7cdf 100644
--- a/include/clocksource/timer-sp804.h
+++ b/include/clocksource/timer-sp804.h
@@ -5,20 +5,23 @@
 struct clk;
 
 int __sp804_clocksource_and_sched_clock_init(void __iomem *,
+					     unsigned long phys,
 					     const char *, struct clk *, int);
 int __sp804_clockevents_init(void __iomem *, unsigned int,
 			     struct clk *, const char *);
 void sp804_timer_disable(void __iomem *);
 
-static inline void sp804_clocksource_init(void __iomem *base, const char *name)
+static inline void sp804_clocksource_init(void __iomem *base, unsigned long phys,
+					  const char *name)
 {
-	__sp804_clocksource_and_sched_clock_init(base, name, NULL, 0);
+	__sp804_clocksource_and_sched_clock_init(base, phys, name, NULL, 0);
 }
 
 static inline void sp804_clocksource_and_sched_clock_init(void __iomem *base,
+							  unsigned long phys,
 							  const char *name)
 {
-	__sp804_clocksource_and_sched_clock_init(base, name, NULL, 1);
+	__sp804_clocksource_and_sched_clock_init(base, phys, name, NULL, 1);
 }
 
 static inline void sp804_clockevents_init(void __iomem *base, unsigned int irq, const char *name)
diff --git a/include/ipipe/setup.h b/include/ipipe/setup.h
new file mode 100644
index 000000000000..c2bc5218cf65
--- /dev/null
+++ b/include/ipipe/setup.h
@@ -0,0 +1,10 @@
+#ifndef _IPIPE_SETUP_H
+#define _IPIPE_SETUP_H
+
+/*
+ * Placeholders for setup hooks defined by client domains.
+ */
+
+static inline void __ipipe_early_client_setup(void) { }
+
+#endif /* !_IPIPE_SETUP_H */
diff --git a/include/ipipe/thread_info.h b/include/ipipe/thread_info.h
new file mode 100644
index 000000000000..7038c12942c8
--- /dev/null
+++ b/include/ipipe/thread_info.h
@@ -0,0 +1,14 @@
+#ifndef _IPIPE_THREAD_INFO_H
+#define _IPIPE_THREAD_INFO_H
+
+/*
+ * Placeholder for private thread information defined by client
+ * domains.
+ */
+
+struct ipipe_threadinfo {
+};
+
+#define __ipipe_init_threadinfo(__p) do { } while (0)
+
+#endif /* !_IPIPE_THREAD_INFO_H */
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 8ae9a95ebf5b..ab59237a85d7 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -129,6 +129,15 @@ struct clock_event_device {
 	const struct cpumask	*cpumask;
 	struct list_head	list;
 	struct module		*owner;
+
+#ifdef CONFIG_IPIPE
+	struct ipipe_timer      *ipipe_timer;
+	unsigned                ipipe_stolen;
+
+#define clockevent_ipipe_stolen(evt) ((evt)->ipipe_stolen)
+#else
+#define clockevent_ipipe_stolen(evt) (0)
+#endif /* !CONFIG_IPIPE */
 } ____cacheline_aligned;
 
 /* Helpers to verify state of a clockevent device */
diff --git a/include/linux/console.h b/include/linux/console.h
index d09951d5a94e..e688eb7f0227 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -141,10 +141,12 @@ static inline int con_debug_leave(void)
 #define CON_ANYTIME	(16) /* Safe to call when cpu is offline */
 #define CON_BRL		(32) /* Used for a braille device */
 #define CON_EXTENDED	(64) /* Use the extended output format a la /dev/kmsg */
+#define CON_RAW		(128) /* Supports raw write mode */
 
 struct console {
 	char	name[16];
 	void	(*write)(struct console *, const char *, unsigned);
+	void	(*write_raw)(struct console *, const char *, unsigned);
 	int	(*read)(struct console *, char *, unsigned);
 	struct tty_driver *(*device)(struct console *, int *);
 	void	(*unblank)(void);
diff --git a/include/linux/dw_apb_timer.h b/include/linux/dw_apb_timer.h
index 14f072edbca5..66575506377b 100644
--- a/include/linux/dw_apb_timer.h
+++ b/include/linux/dw_apb_timer.h
@@ -32,6 +32,7 @@ struct dw_apb_clock_event_device {
 struct dw_apb_clocksource {
 	struct dw_apb_timer			timer;
 	struct clocksource			cs;
+	unsigned long				phys;
 };
 
 void dw_apb_clockevent_register(struct dw_apb_clock_event_device *dw_ced);
@@ -44,7 +45,7 @@ dw_apb_clockevent_init(int cpu, const char *name, unsigned rating,
 		       void __iomem *base, int irq, unsigned long freq);
 struct dw_apb_clocksource *
 dw_apb_clocksource_init(unsigned rating, const char *name, void __iomem *base,
-			unsigned long freq);
+			unsigned long phys, unsigned long freq);
 void dw_apb_clocksource_register(struct dw_apb_clocksource *dw_cs);
 void dw_apb_clocksource_start(struct dw_apb_clocksource *dw_cs);
 u64 dw_apb_clocksource_read(struct dw_apb_clocksource *dw_cs);
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 8a8cb3c401b2..ac9333f9161f 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -160,6 +160,7 @@ enum {
 	FTRACE_OPS_FL_PID			= 1 << 13,
 	FTRACE_OPS_FL_RCU			= 1 << 14,
 	FTRACE_OPS_FL_TRACE_ARRAY		= 1 << 15,
+	FTRACE_OPS_FL_IPIPE_EXCLUSIVE		= 1 << 17,
 };
 
 #ifdef CONFIG_DYNAMIC_FTRACE
diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h
index 5dd9c982e2cb..c3699ef94760 100644
--- a/include/linux/gpio/driver.h
+++ b/include/linux/gpio/driver.h
@@ -392,7 +392,7 @@ struct gpio_chip {
 	void __iomem *reg_dir_in;
 	bool bgpio_dir_unreadable;
 	int bgpio_bits;
-	spinlock_t bgpio_lock;
+	ipipe_spinlock_t bgpio_lock;
 	unsigned long bgpio_data;
 	unsigned long bgpio_dir;
 #endif /* CONFIG_GPIO_GENERIC */
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index da0af631ded5..1b8f0fd221a1 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -6,6 +6,7 @@
 #include <linux/lockdep.h>
 #include <linux/ftrace_irq.h>
 #include <linux/vtime.h>
+#include <linux/ipipe.h>
 #include <asm/hardirq.h>
 
 
@@ -67,6 +68,7 @@ extern void irq_exit(void);
 
 #define nmi_enter()						\
 	do {							\
+		__ipipe_nmi_enter();				\
 		arch_nmi_enter();				\
 		printk_nmi_enter();				\
 		lockdep_off();					\
@@ -87,6 +89,7 @@ extern void irq_exit(void);
 		lockdep_on();					\
 		printk_nmi_exit();				\
 		arch_nmi_exit();				\
+		__ipipe_nmi_exit();				\
 	} while (0)
 
 #endif /* LINUX_HARDIRQ_H */
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 89fc59dab57d..21e2ebd00368 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -504,6 +504,23 @@ extern bool force_irqthreads;
 #define hard_irq_disable()	do { } while(0)
 #endif
 
+/*
+ * Unlike other virtualized interrupt disabling schemes may assume, we
+ * can't expect local_irq_restore() to turn hard interrupts on when
+ * pipelining.  hard_irq_enable() is introduced to be paired with
+ * hard_irq_disable(), for unconditionally turning them on. The only
+ * sane sequence mixing virtual and real disable state manipulation
+ * is:
+ *
+ * 1. local_irq_save/disable
+ * 2. hard_irq_disable
+ * 3. hard_irq_enable
+ * 4. local_irq_restore/enable
+ */
+#ifndef hard_irq_enable
+#define hard_irq_enable()	hard_cond_local_irq_enable()
+#endif
+
 /* PLEASE, avoid to allocate new softirqs, if you need not _really_ high
    frequency threaded job scheduling. For almost all the purposes
    tasklets are more than enough. F.e. all serial device BHs et
diff --git a/include/linux/ipipe.h b/include/linux/ipipe.h
new file mode 100644
index 000000000000..fe90cb55d462
--- /dev/null
+++ b/include/linux/ipipe.h
@@ -0,0 +1,721 @@
+/* -*- linux-c -*-
+ * include/linux/ipipe.h
+ *
+ * Copyright (C) 2002-2014 Philippe Gerum.
+ *               2007 Jan Kiszka.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef __LINUX_IPIPE_H
+#define __LINUX_IPIPE_H
+
+#include <linux/spinlock.h>
+#include <linux/cache.h>
+#include <linux/percpu.h>
+#include <linux/irq.h>
+#include <linux/thread_info.h>
+#include <linux/ipipe_debug.h>
+#include <asm/ptrace.h>
+#ifdef CONFIG_HAVE_IPIPE_SUPPORT
+#include <asm/ipipe.h>
+#endif
+
+struct cpuidle_device;
+struct cpuidle_state;
+struct kvm_vcpu;
+struct ipipe_vm_notifier;
+struct irq_desc;
+struct task_struct;
+struct mm_struct;
+
+#ifdef CONFIG_IPIPE
+
+#include <linux/ipipe_domain.h>
+
+#define IPIPE_CORE_APIREV  CONFIG_IPIPE_CORE_APIREV
+
+#include <linux/ipipe_domain.h>
+#include <linux/compiler.h>
+#include <linux/linkage.h>
+#include <asm/ipipe_base.h>
+
+struct pt_regs;
+struct ipipe_domain;
+
+struct ipipe_vm_notifier {
+	void (*handler)(struct ipipe_vm_notifier *nfy);
+};
+
+static inline int ipipe_virtual_irq_p(unsigned int irq)
+{
+	return irq >= IPIPE_VIRQ_BASE && irq < IPIPE_NR_IRQS;
+}
+
+void __ipipe_init_early(void);
+
+void __ipipe_init(void);
+
+#ifdef CONFIG_PROC_FS
+void __ipipe_init_proc(void);
+#ifdef CONFIG_IPIPE_TRACE
+void __ipipe_init_tracer(void);
+#else /* !CONFIG_IPIPE_TRACE */
+static inline void __ipipe_init_tracer(void) { }
+#endif /* CONFIG_IPIPE_TRACE */
+#else	/* !CONFIG_PROC_FS */
+static inline void __ipipe_init_proc(void) { }
+#endif	/* CONFIG_PROC_FS */
+
+void __ipipe_restore_root_nosync(unsigned long x);
+
+#define IPIPE_IRQF_NOACK    0x1
+#define IPIPE_IRQF_NOSYNC   0x2
+
+void __ipipe_dispatch_irq(unsigned int irq, int flags);
+
+void __ipipe_do_sync_stage(void);
+
+void __ipipe_do_sync_pipeline(struct ipipe_domain *top);
+
+void __ipipe_lock_irq(unsigned int irq);
+
+void __ipipe_unlock_irq(unsigned int irq);
+
+void __ipipe_do_critical_sync(unsigned int irq, void *cookie);
+
+void __ipipe_ack_edge_irq(struct irq_desc *desc);
+
+void __ipipe_nop_irq(struct irq_desc *desc);
+
+static inline void __ipipe_idle(void)
+{
+	ipipe_unstall_root();
+}
+
+#ifndef __ipipe_sync_check
+#define __ipipe_sync_check	1
+#endif
+
+static inline void __ipipe_sync_stage(void)
+{
+	if (likely(__ipipe_sync_check))
+		__ipipe_do_sync_stage();
+}
+
+#ifndef __ipipe_run_irqtail
+#define __ipipe_run_irqtail(irq) do { } while(0)
+#endif
+
+int __ipipe_log_printk(const char *fmt, va_list args);
+void __ipipe_flush_printk(unsigned int irq, void *cookie);
+
+#define __ipipe_get_cpu(flags)	({ (flags) = hard_preempt_disable(); ipipe_processor_id(); })
+#define __ipipe_put_cpu(flags)	hard_preempt_enable(flags)
+
+int __ipipe_notify_kevent(int event, void *data);
+
+#define __ipipe_report_sigwake(p)					\
+	do {								\
+		if (ipipe_notifier_enabled_p(p))			\
+			__ipipe_notify_kevent(IPIPE_KEVT_SIGWAKE, p);	\
+	} while (0)
+
+struct ipipe_cpu_migration_data {
+	struct task_struct *task;
+	int dest_cpu;
+};
+
+#define __ipipe_report_setaffinity(__p, __dest_cpu)			\
+	do {								\
+		struct ipipe_cpu_migration_data d = {			\
+			.task = (__p),					\
+			.dest_cpu = (__dest_cpu),			\
+		};							\
+		if (ipipe_notifier_enabled_p(__p))			\
+			__ipipe_notify_kevent(IPIPE_KEVT_SETAFFINITY, &d); \
+	} while (0)
+
+#define __ipipe_report_exit(p)						\
+	do {								\
+		if (ipipe_notifier_enabled_p(p))			\
+			__ipipe_notify_kevent(IPIPE_KEVT_EXIT, p);	\
+	} while (0)
+
+#define __ipipe_report_setsched(p)					\
+	do {								\
+		if (ipipe_notifier_enabled_p(p))			\
+			__ipipe_notify_kevent(IPIPE_KEVT_SETSCHED, p); \
+	} while (0)
+
+#define __ipipe_report_schedule(prev, next)				\
+do {									\
+	if (ipipe_notifier_enabled_p(next) ||				\
+	    ipipe_notifier_enabled_p(prev)) {				\
+		__this_cpu_write(ipipe_percpu.rqlock_owner, prev);	\
+		__ipipe_notify_kevent(IPIPE_KEVT_SCHEDULE, next);	\
+	}								\
+} while (0)
+
+#define __ipipe_report_cleanup(mm)					\
+	__ipipe_notify_kevent(IPIPE_KEVT_CLEANUP, mm)
+
+#define __ipipe_report_clockfreq_update(freq)				\
+	__ipipe_notify_kevent(IPIPE_KEVT_CLOCKFREQ, &(freq))
+
+struct ipipe_ptrace_resume_data {
+	struct task_struct *task;
+	long request;
+};
+
+#define __ipipe_report_ptrace_resume(__p, __request)			\
+	do {								\
+		struct ipipe_ptrace_resume_data d = {			\
+			.task = (__p),					\
+			.request = (__request),				\
+		};							\
+		if (ipipe_notifier_enabled_p(__p))			\
+			__ipipe_notify_kevent(IPIPE_KEVT_PTRESUME, &d); \
+	} while (0)
+
+int __ipipe_notify_syscall(struct pt_regs *regs);
+
+int __ipipe_notify_trap(int exception, struct pt_regs *regs);
+
+#define __ipipe_report_trap(exception, regs)				\
+	__ipipe_notify_trap(exception, regs)
+
+void __ipipe_call_mayday(struct pt_regs *regs);
+
+int __ipipe_notify_user_intreturn(void);
+
+#define __ipipe_serial_debug(__fmt, __args...)	raw_printk(__fmt, ##__args)
+
+struct ipipe_trap_data {
+	int exception;
+	struct pt_regs *regs;
+};
+
+/* ipipe_set_hooks(..., enables) */
+#define IPIPE_SYSCALL	__IPIPE_SYSCALL_E
+#define IPIPE_TRAP	__IPIPE_TRAP_E
+#define IPIPE_KEVENT	__IPIPE_KEVENT_E
+
+struct ipipe_sysinfo {
+	int sys_nr_cpus;	/* Number of CPUs on board */
+	int sys_hrtimer_irq;	/* hrtimer device IRQ */
+	u64 sys_hrtimer_freq;	/* hrtimer device frequency */
+	u64 sys_hrclock_freq;	/* hrclock device frequency */
+	u64 sys_cpu_freq;	/* CPU frequency (Hz) */
+	struct ipipe_arch_sysinfo arch;
+};
+
+struct ipipe_work_header {
+	size_t size;
+	void (*handler)(struct ipipe_work_header *work);
+};
+
+extern unsigned int __ipipe_printk_virq;
+
+void __ipipe_set_irq_pending(struct ipipe_domain *ipd, unsigned int irq);
+
+void __ipipe_complete_domain_migration(void);
+
+int __ipipe_switch_tail(void);
+
+int __ipipe_migrate_head(void);
+
+void __ipipe_reenter_root(void);
+
+void __ipipe_share_current(int flags);
+
+void __ipipe_arch_share_current(int flags);
+
+int __ipipe_disable_ondemand_mappings(struct task_struct *p);
+
+int __ipipe_pin_vma(struct mm_struct *mm, struct vm_area_struct *vma);
+
+/*
+ * Obsolete - no arch implements PIC muting anymore. Null helpers are
+ * kept for building legacy co-kernel releases.
+ */
+static inline void ipipe_mute_pic(void) { }
+static inline void ipipe_unmute_pic(void) { }
+
+#ifdef CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH
+
+#define prepare_arch_switch(next)			\
+	do {						\
+		hard_local_irq_enable();		\
+		__ipipe_report_schedule(current, next);	\
+	} while(0)
+
+#ifndef ipipe_get_active_mm
+static inline struct mm_struct *ipipe_get_active_mm(void)
+{
+	return __this_cpu_read(ipipe_percpu.active_mm);
+}
+#define ipipe_get_active_mm ipipe_get_active_mm
+#endif
+
+#else /* !CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH */
+
+#define prepare_arch_switch(next)			\
+	do {						\
+		__ipipe_report_schedule(current, next);	\
+		hard_local_irq_disable();		\
+	} while(0)
+
+#ifndef ipipe_get_active_mm
+#define ipipe_get_active_mm()	(current->active_mm)
+#endif
+
+#endif /* !CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH */
+
+static inline bool __ipipe_hrclock_ok(void)
+{
+	return __ipipe_hrclock_freq != 0;
+}
+
+static inline void __ipipe_nmi_enter(void)
+{
+	__this_cpu_write(ipipe_percpu.nmi_state, __ipipe_root_status);
+	__set_bit(IPIPE_STALL_FLAG, &__ipipe_root_status);
+	ipipe_save_context_nmi();
+}
+
+static inline void __ipipe_nmi_exit(void)
+{
+	ipipe_restore_context_nmi();
+	if (!test_bit(IPIPE_STALL_FLAG, raw_cpu_ptr(&ipipe_percpu.nmi_state)))
+		__clear_bit(IPIPE_STALL_FLAG, &__ipipe_root_status);
+}
+
+/* KVM-side calls, hw IRQs off. */
+static inline void __ipipe_enter_vm(struct ipipe_vm_notifier *vmf)
+{
+	struct ipipe_percpu_data *p;
+
+	p = raw_cpu_ptr(&ipipe_percpu);
+	p->vm_notifier = vmf;
+	barrier();
+}
+
+static inline void __ipipe_exit_vm(void)
+{
+	struct ipipe_percpu_data *p;
+
+	p = raw_cpu_ptr(&ipipe_percpu);
+	p->vm_notifier = NULL;
+	barrier();
+}
+
+/* Client-side call, hw IRQs off. */
+void __ipipe_notify_vm_preemption(void);
+
+static inline void __ipipe_sync_pipeline(struct ipipe_domain *top)
+{
+	if (__ipipe_current_domain != top) {
+		__ipipe_do_sync_pipeline(top);
+		return;
+	}
+	if (!test_bit(IPIPE_STALL_FLAG, &ipipe_this_cpu_context(top)->status))
+		__ipipe_sync_stage();
+}
+
+void ipipe_register_head(struct ipipe_domain *ipd,
+			 const char *name);
+
+void ipipe_unregister_head(struct ipipe_domain *ipd);
+
+int ipipe_request_irq(struct ipipe_domain *ipd,
+		      unsigned int irq,
+		      ipipe_irq_handler_t handler,
+		      void *cookie,
+		      ipipe_irq_ackfn_t ackfn);
+
+void ipipe_free_irq(struct ipipe_domain *ipd,
+		    unsigned int irq);
+
+void ipipe_raise_irq(unsigned int irq);
+
+void ipipe_set_hooks(struct ipipe_domain *ipd,
+		     int enables);
+
+int ipipe_handle_syscall(struct thread_info *ti,
+			 unsigned long nr, struct pt_regs *regs);
+
+unsigned int ipipe_alloc_virq(void);
+
+void ipipe_free_virq(unsigned int virq);
+
+static inline void ipipe_post_irq_head(unsigned int irq)
+{
+	__ipipe_set_irq_pending(ipipe_head_domain, irq);
+}
+
+static inline void ipipe_post_irq_root(unsigned int irq)
+{
+	__ipipe_set_irq_pending(&ipipe_root, irq);
+}
+
+static inline void ipipe_stall_head(void)
+{
+	hard_local_irq_disable();
+	__set_bit(IPIPE_STALL_FLAG, &__ipipe_head_status);
+}
+
+static inline unsigned long ipipe_test_and_stall_head(void)
+{
+	hard_local_irq_disable();
+	return __test_and_set_bit(IPIPE_STALL_FLAG, &__ipipe_head_status);
+}
+
+static inline unsigned long ipipe_test_head(void)
+{
+	unsigned long flags, ret;
+
+	flags = hard_smp_local_irq_save();
+	ret = test_bit(IPIPE_STALL_FLAG, &__ipipe_head_status);
+	hard_smp_local_irq_restore(flags);
+
+	return ret;
+}
+
+void ipipe_unstall_head(void);
+
+void __ipipe_restore_head(unsigned long x);
+
+static inline void ipipe_restore_head(unsigned long x)
+{
+	ipipe_check_irqoff();
+	if ((x ^ test_bit(IPIPE_STALL_FLAG, &__ipipe_head_status)) & 1)
+		__ipipe_restore_head(x);
+}
+
+void __ipipe_post_work_root(struct ipipe_work_header *work);
+
+#define ipipe_post_work_root(p, header)			\
+	do {						\
+		void header_not_at_start(void);		\
+		if (offsetof(typeof(*(p)), header)) {	\
+			header_not_at_start();		\
+		}					\
+		__ipipe_post_work_root(&(p)->header);	\
+	} while (0)
+
+int ipipe_get_sysinfo(struct ipipe_sysinfo *sysinfo);
+
+unsigned long ipipe_critical_enter(void (*syncfn)(void));
+
+void ipipe_critical_exit(unsigned long flags);
+
+void ipipe_prepare_panic(void);
+
+#ifdef CONFIG_SMP
+#ifndef ipipe_smp_p
+#define ipipe_smp_p (1)
+#endif
+int ipipe_set_irq_affinity(unsigned int irq, cpumask_t cpumask);
+void ipipe_send_ipi(unsigned int ipi, cpumask_t cpumask);
+#else  /* !CONFIG_SMP */
+#define ipipe_smp_p (0)
+static inline
+int ipipe_set_irq_affinity(unsigned int irq, cpumask_t cpumask) { return 0; }
+static inline void ipipe_send_ipi(unsigned int ipi, cpumask_t cpumask) { }
+static inline void ipipe_disable_smp(void) { }
+#endif	/* CONFIG_SMP */
+
+static inline void ipipe_restore_root_nosync(unsigned long x)
+{
+	unsigned long flags;
+
+	flags = hard_smp_local_irq_save();
+	__ipipe_restore_root_nosync(x);
+	hard_smp_local_irq_restore(flags);
+}
+
+/* Must be called hw IRQs off. */
+static inline void ipipe_lock_irq(unsigned int irq)
+{
+	struct ipipe_domain *ipd = __ipipe_current_domain;
+	if (ipd == ipipe_root_domain)
+		__ipipe_lock_irq(irq);
+}
+
+/* Must be called hw IRQs off. */
+static inline void ipipe_unlock_irq(unsigned int irq)
+{
+	struct ipipe_domain *ipd = __ipipe_current_domain;
+	if (ipd == ipipe_root_domain)
+		__ipipe_unlock_irq(irq);
+}
+
+static inline struct ipipe_threadinfo *ipipe_current_threadinfo(void)
+{
+	return &current_thread_info()->ipipe_data;
+}
+
+#define ipipe_task_threadinfo(p) (&task_thread_info(p)->ipipe_data)
+
+int ipipe_enable_irq(unsigned int irq);
+
+static inline void ipipe_disable_irq(unsigned int irq)
+{
+	struct irq_desc *desc;
+	struct irq_chip *chip;
+
+	desc = irq_to_desc(irq);
+	if (desc == NULL)
+		return;
+
+	chip = irq_desc_get_chip(desc);
+
+	if (WARN_ON_ONCE(chip->irq_disable == NULL && chip->irq_mask == NULL))
+		return;
+
+	if (chip->irq_disable)
+		chip->irq_disable(&desc->irq_data);
+	else
+		chip->irq_mask(&desc->irq_data);
+}
+
+static inline void ipipe_end_irq(unsigned int irq)
+{
+	struct irq_desc *desc = irq_to_desc(irq);
+
+	if (desc)
+		desc->ipipe_end(desc);
+}
+
+static inline int ipipe_chained_irq_p(struct irq_desc *desc)
+{
+	void __ipipe_chained_irq(struct irq_desc *desc);
+
+	return desc->handle_irq == __ipipe_chained_irq;
+}
+
+static inline void ipipe_handle_demuxed_irq(unsigned int cascade_irq)
+{
+	ipipe_trace_irq_entry(cascade_irq);
+	__ipipe_dispatch_irq(cascade_irq, IPIPE_IRQF_NOSYNC);
+	ipipe_trace_irq_exit(cascade_irq);
+}
+
+static inline void __ipipe_init_threadflags(struct thread_info *ti)
+{
+	ti->ipipe_flags = 0;
+}
+
+static inline
+void ipipe_set_ti_thread_flag(struct thread_info *ti, int flag)
+{
+	set_bit(flag, &ti->ipipe_flags);
+}
+
+static inline
+void ipipe_clear_ti_thread_flag(struct thread_info *ti, int flag)
+{
+	clear_bit(flag, &ti->ipipe_flags);
+}
+
+static inline
+void ipipe_test_and_clear_ti_thread_flag(struct thread_info *ti, int flag)
+{
+	test_and_clear_bit(flag, &ti->ipipe_flags);
+}
+
+static inline
+int ipipe_test_ti_thread_flag(struct thread_info *ti, int flag)
+{
+	return test_bit(flag, &ti->ipipe_flags);
+}
+
+#define ipipe_set_thread_flag(flag) \
+	ipipe_set_ti_thread_flag(current_thread_info(), flag)
+
+#define ipipe_clear_thread_flag(flag) \
+	ipipe_clear_ti_thread_flag(current_thread_info(), flag)
+
+#define ipipe_test_and_clear_thread_flag(flag) \
+	ipipe_test_and_clear_ti_thread_flag(current_thread_info(), flag)
+
+#define ipipe_test_thread_flag(flag) \
+	ipipe_test_ti_thread_flag(current_thread_info(), flag)
+
+#define ipipe_enable_notifier(p)					\
+	ipipe_set_ti_thread_flag(task_thread_info(p), TIP_NOTIFY)
+
+#define ipipe_disable_notifier(p)					\
+	do {								\
+		struct thread_info *ti = task_thread_info(p);		\
+		ipipe_clear_ti_thread_flag(ti, TIP_NOTIFY);		\
+		ipipe_clear_ti_thread_flag(ti, TIP_MAYDAY);		\
+	} while (0)
+
+#define ipipe_notifier_enabled_p(p)					\
+	ipipe_test_ti_thread_flag(task_thread_info(p), TIP_NOTIFY)
+
+#define ipipe_raise_mayday(p)						\
+	do {								\
+		struct thread_info *ti = task_thread_info(p);		\
+		ipipe_check_irqoff();					\
+		if (ipipe_test_ti_thread_flag(ti, TIP_NOTIFY))		\
+			ipipe_set_ti_thread_flag(ti, TIP_MAYDAY);	\
+	} while (0)
+
+#define ipipe_enable_user_intret_notifier()				\
+	ipipe_set_thread_flag(TIP_USERINTRET)
+
+#define ipipe_disable_user_intret_notifier()				\
+	ipipe_clear_thread_flag(TIP_USERINTRET)
+
+#define ipipe_user_intret_notifier_enabled(ti)				\
+	ipipe_test_ti_thread_flag(ti, TIP_USERINTRET)
+
+#ifdef CONFIG_IPIPE_TRACE
+void __ipipe_tracer_hrclock_initialized(void);
+#else /* !CONFIG_IPIPE_TRACE */
+#define __ipipe_tracer_hrclock_initialized()	do { } while(0)
+#endif /* !CONFIG_IPIPE_TRACE */
+
+#ifdef CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH
+#define ipipe_mm_switch_protect(__flags)	do { (void)(__flags); } while (0)
+#define ipipe_mm_switch_unprotect(__flags)	do { (void)(__flags); } while (0)
+#else /* !CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH */
+#define ipipe_mm_switch_protect(__flags)		\
+	do {						\
+		(__flags) = hard_local_irq_save();	\
+	} while (0)
+#define ipipe_mm_switch_unprotect(__flags)		\
+	do {						\
+		hard_local_irq_restore(__flags);	\
+	} while (0)
+#endif /* !CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH */
+
+bool ipipe_enter_cpuidle(struct cpuidle_device *dev,
+			 struct cpuidle_state *state);
+
+#else	/* !CONFIG_IPIPE */
+
+static inline void __ipipe_init_early(void) { }
+
+static inline void __ipipe_init(void) { }
+
+static inline void __ipipe_init_proc(void) { }
+
+static inline void __ipipe_idle(void) { }
+
+static inline void __ipipe_report_sigwake(struct task_struct *p) { }
+
+static inline void __ipipe_report_setaffinity(struct task_struct *p,
+					      int dest_cpu) { }
+
+static inline void __ipipe_report_setsched(struct task_struct *p) { }
+
+static inline void __ipipe_report_exit(struct task_struct *p) { }
+
+static inline void __ipipe_report_cleanup(struct mm_struct *mm) { }
+
+static inline void __ipipe_report_ptrace_resume(struct task_struct *p,
+						long request) { }
+
+#define __ipipe_report_trap(exception, regs)  0
+
+#define hard_preempt_disable()		({ preempt_disable(); 0; })
+#define hard_preempt_enable(flags)	({ preempt_enable(); (void)(flags); })
+
+#define __ipipe_get_cpu(flags)		({ (void)(flags); get_cpu(); })
+#define __ipipe_put_cpu(flags)		\
+	do {				\
+		(void)(flags);		\
+		put_cpu();		\
+	} while (0)
+
+#define __ipipe_root_tick_p(regs)	1
+
+#define ipipe_handle_domain_irq(__domain, __hwirq, __regs)	\
+	handle_domain_irq(__domain, __hwirq, __regs)
+
+#define ipipe_handle_demuxed_irq(irq)		generic_handle_irq(irq)
+
+#define __ipipe_enter_vm(vmf)	do { } while (0)
+
+static inline void __ipipe_exit_vm(void) { }
+
+static inline void __ipipe_notify_vm_preemption(void) { }
+
+#define __ipipe_notify_user_intreturn()	0
+
+#define __ipipe_serial_debug(__fmt, __args...)	do { } while (0)
+
+#define __ipipe_root_p		1
+#define ipipe_root_p		1
+
+#define ipipe_mm_switch_protect(__flags)	do { (void)(__flags); } while (0)
+#define ipipe_mm_switch_unprotect(__flags)	do { (void)(__flags); } while (0)
+
+static inline void __ipipe_init_threadflags(struct thread_info *ti) { }
+
+static inline void __ipipe_complete_domain_migration(void) { }
+
+static inline int __ipipe_switch_tail(void)
+{
+	return 0;
+}
+
+static inline void __ipipe_nmi_enter(void) { }
+
+static inline void __ipipe_nmi_exit(void) { }
+
+#define ipipe_processor_id()	smp_processor_id()
+
+static inline void ipipe_lock_irq(unsigned int irq) { }
+
+static inline void ipipe_unlock_irq(unsigned int irq) { }
+
+static inline
+int ipipe_handle_syscall(struct thread_info *ti,
+			 unsigned long nr, struct pt_regs *regs)
+{
+	return 0;
+}
+
+static inline
+bool ipipe_enter_cpuidle(struct cpuidle_device *dev,
+			 struct cpuidle_state *state)
+{
+	return true;
+}
+
+#define ipipe_user_intret_notifier_enabled(ti)	0
+
+#endif	/* !CONFIG_IPIPE */
+
+#ifdef CONFIG_IPIPE_WANT_PTE_PINNING
+void __ipipe_pin_mapping_globally(unsigned long start,
+				  unsigned long end);
+#else
+static inline void __ipipe_pin_mapping_globally(unsigned long start,
+						unsigned long end)
+{ }
+#endif
+
+#ifndef ipipe_root_nr_syscalls
+#define ipipe_root_nr_syscalls(ti)	NR_syscalls
+#endif
+
+#endif	/* !__LINUX_IPIPE_H */
diff --git a/include/linux/ipipe_debug.h b/include/linux/ipipe_debug.h
new file mode 100644
index 000000000000..5d7efefbdddf
--- /dev/null
+++ b/include/linux/ipipe_debug.h
@@ -0,0 +1,100 @@
+/* -*- linux-c -*-
+ * include/linux/ipipe_debug.h
+ *
+ * Copyright (C) 2012 Philippe Gerum <rpm@xenomai.org>.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef __LINUX_IPIPE_DEBUG_H
+#define __LINUX_IPIPE_DEBUG_H
+
+#include <linux/ipipe_domain.h>
+
+#ifdef CONFIG_IPIPE_DEBUG_CONTEXT
+
+#include <asm/bug.h>
+
+static inline int ipipe_disable_context_check(void)
+{
+	return xchg(raw_cpu_ptr(&ipipe_percpu.context_check), 0);
+}
+
+static inline void ipipe_restore_context_check(int old_state)
+{
+	__this_cpu_write(ipipe_percpu.context_check, old_state);
+}
+
+static inline void ipipe_context_check_off(void)
+{
+	int cpu;
+	for_each_online_cpu(cpu)
+		per_cpu(ipipe_percpu, cpu).context_check = 0;
+}
+
+static inline void ipipe_save_context_nmi(void)
+{
+	int state = ipipe_disable_context_check();
+	__this_cpu_write(ipipe_percpu.context_check_saved, state);
+}
+
+static inline void ipipe_restore_context_nmi(void)
+{
+	ipipe_restore_context_check(__this_cpu_read(ipipe_percpu.context_check_saved));
+}
+
+#else	/* !CONFIG_IPIPE_DEBUG_CONTEXT */
+
+static inline int ipipe_disable_context_check(void)
+{
+	return 0;
+}
+
+static inline void ipipe_restore_context_check(int old_state) { }
+
+static inline void ipipe_context_check_off(void) { }
+
+static inline void ipipe_save_context_nmi(void) { }
+
+static inline void ipipe_restore_context_nmi(void) { }
+
+#endif	/* !CONFIG_IPIPE_DEBUG_CONTEXT */
+
+#ifdef CONFIG_IPIPE_DEBUG
+
+#define ipipe_check_irqoff()					\
+	do {							\
+		if (WARN_ON_ONCE(!hard_irqs_disabled()))	\
+			hard_local_irq_disable();		\
+	} while (0)
+
+#else /* !CONFIG_IPIPE_DEBUG */
+
+static inline void ipipe_check_irqoff(void) { }
+
+#endif /* !CONFIG_IPIPE_DEBUG */
+
+#ifdef CONFIG_IPIPE_DEBUG_INTERNAL
+#define IPIPE_WARN(c)		WARN_ON(c)
+#define IPIPE_WARN_ONCE(c)	WARN_ON_ONCE(c)
+#define IPIPE_BUG_ON(c)		BUG_ON(c)
+#else
+#define IPIPE_WARN(c)		do { (void)(c); } while (0)
+#define IPIPE_WARN_ONCE(c)	do { (void)(c); } while (0)
+#define IPIPE_BUG_ON(c)		do { (void)(c); } while (0)
+#endif
+
+#endif /* !__LINUX_IPIPE_DEBUG_H */
diff --git a/include/linux/ipipe_domain.h b/include/linux/ipipe_domain.h
new file mode 100644
index 000000000000..6c7504d15055
--- /dev/null
+++ b/include/linux/ipipe_domain.h
@@ -0,0 +1,368 @@
+/*   -*- linux-c -*-
+ *   include/linux/ipipe_domain.h
+ *
+ *   Copyright (C) 2007-2012 Philippe Gerum.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ *   USA; either version 2 of the License, or (at your option) any later
+ *   version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *   GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef __LINUX_IPIPE_DOMAIN_H
+#define __LINUX_IPIPE_DOMAIN_H
+
+#ifdef CONFIG_IPIPE
+
+#include <linux/mutex.h>
+#include <linux/percpu.h>
+#include <asm/ptrace.h>
+#include <asm/hw_irq.h>
+#include <asm/ipipe_base.h>
+
+struct task_struct;
+struct mm_struct;
+struct irq_desc;
+struct ipipe_vm_notifier;
+
+#define __bpl_up(x)		(((x)+(BITS_PER_LONG-1)) & ~(BITS_PER_LONG-1))
+/* Number of virtual IRQs (must be a multiple of BITS_PER_LONG) */
+#define IPIPE_NR_VIRQS		BITS_PER_LONG
+/* First virtual IRQ # (must be aligned on BITS_PER_LONG) */
+#define IPIPE_VIRQ_BASE		__bpl_up(IPIPE_NR_XIRQS)
+/* Total number of IRQ slots */
+#define IPIPE_NR_IRQS		(IPIPE_VIRQ_BASE+IPIPE_NR_VIRQS)
+
+#define IPIPE_IRQ_MAPSZ		(IPIPE_NR_IRQS / BITS_PER_LONG)
+#define IPIPE_IRQ_1MAPSZ	BITS_PER_LONG
+#if IPIPE_IRQ_MAPSZ > BITS_PER_LONG * BITS_PER_LONG
+/*
+ * We need a 4-level mapping, up to 16M IRQs (64bit long, MAXSMP
+ * defines 512K IRQs).
+ */
+#define __IPIPE_IRQMAP_LEVELS	4
+#define IPIPE_IRQ_2MAPSZ	(BITS_PER_LONG * BITS_PER_LONG)
+#elif IPIPE_IRQ_MAPSZ > BITS_PER_LONG
+/*
+ * 3-level mapping. Up to 256K IRQs (64 bit long).
+ */
+#define __IPIPE_IRQMAP_LEVELS	3
+#else
+/*
+ * 2-level mapping is enough. Up to 4K IRQs (64 bit long).
+ */
+#define __IPIPE_IRQMAP_LEVELS	2
+#endif
+
+/* Per-cpu pipeline status */
+#define IPIPE_STALL_FLAG	0 /* interrupts (virtually) disabled. */
+#define IPIPE_STALL_MASK	(1L << IPIPE_STALL_FLAG)
+
+/* Interrupt control bits */
+#define IPIPE_HANDLE_FLAG	0
+#define IPIPE_STICKY_FLAG	1
+#define IPIPE_LOCK_FLAG		2
+#define IPIPE_HANDLE_MASK	(1 << IPIPE_HANDLE_FLAG)
+#define IPIPE_STICKY_MASK	(1 << IPIPE_STICKY_FLAG)
+#define IPIPE_LOCK_MASK		(1 << IPIPE_LOCK_FLAG)
+
+#define __IPIPE_SYSCALL_P  0
+#define __IPIPE_TRAP_P     1
+#define __IPIPE_KEVENT_P   2
+#define __IPIPE_SYSCALL_E (1 << __IPIPE_SYSCALL_P)
+#define __IPIPE_TRAP_E	  (1 << __IPIPE_TRAP_P)
+#define __IPIPE_KEVENT_E  (1 << __IPIPE_KEVENT_P)
+#define __IPIPE_ALL_E	   0x7
+#define __IPIPE_SYSCALL_R (8 << __IPIPE_SYSCALL_P)
+#define __IPIPE_TRAP_R	  (8 << __IPIPE_TRAP_P)
+#define __IPIPE_KEVENT_R  (8 << __IPIPE_KEVENT_P)
+#define __IPIPE_SHIFT_R	   3
+#define __IPIPE_ALL_R	  (__IPIPE_ALL_E << __IPIPE_SHIFT_R)
+
+#define IPIPE_KEVT_SCHEDULE	0
+#define IPIPE_KEVT_SIGWAKE	1
+#define IPIPE_KEVT_SETSCHED	2
+#define IPIPE_KEVT_SETAFFINITY	3
+#define IPIPE_KEVT_EXIT		4
+#define IPIPE_KEVT_CLEANUP	5
+#define IPIPE_KEVT_HOSTRT	6
+#define IPIPE_KEVT_CLOCKFREQ	7
+#define IPIPE_KEVT_USERINTRET	8
+#define IPIPE_KEVT_PTRESUME	9
+
+typedef void (*ipipe_irq_ackfn_t)(struct irq_desc *desc);
+
+typedef void (*ipipe_irq_handler_t)(unsigned int irq,
+				    void *cookie);
+
+struct ipipe_domain {
+	int context_offset;
+	struct ipipe_irqdesc {
+		unsigned long control;
+		ipipe_irq_ackfn_t ackfn;
+		ipipe_irq_handler_t handler;
+		void *cookie;
+	} ____cacheline_aligned irqs[IPIPE_NR_IRQS];
+	const char *name;
+	struct mutex mutex;
+};
+
+static inline void *
+__ipipe_irq_cookie(struct ipipe_domain *ipd, unsigned int irq)
+{
+	return ipd->irqs[irq].cookie;
+}
+
+static inline ipipe_irq_handler_t
+__ipipe_irq_handler(struct ipipe_domain *ipd, unsigned int irq)
+{
+	return ipd->irqs[irq].handler;
+}
+
+extern struct ipipe_domain ipipe_root;
+
+#define ipipe_root_domain (&ipipe_root)
+
+extern struct ipipe_domain *ipipe_head_domain;
+
+struct ipipe_percpu_domain_data {
+	unsigned long status;	/* <= Must be first in struct. */
+	unsigned long irqpend_0map;
+#if __IPIPE_IRQMAP_LEVELS >= 3
+	unsigned long irqpend_1map[IPIPE_IRQ_1MAPSZ];
+#if __IPIPE_IRQMAP_LEVELS >= 4
+	unsigned long irqpend_2map[IPIPE_IRQ_2MAPSZ];
+#endif
+#endif
+	unsigned long irqpend_map[IPIPE_IRQ_MAPSZ];
+	unsigned long irqheld_map[IPIPE_IRQ_MAPSZ];
+	unsigned long irqall[IPIPE_NR_IRQS];
+	struct ipipe_domain *domain;
+	int coflags;
+};
+
+struct ipipe_percpu_data {
+	struct ipipe_percpu_domain_data root;
+	struct ipipe_percpu_domain_data head;
+	struct ipipe_percpu_domain_data *curr;
+	struct pt_regs tick_regs;
+	int hrtimer_irq;
+	struct task_struct *task_hijacked;
+	struct task_struct *rqlock_owner;
+	struct ipipe_vm_notifier *vm_notifier;
+	unsigned long nmi_state;
+	struct mm_struct *active_mm;
+#ifdef CONFIG_IPIPE_DEBUG_CONTEXT
+	int context_check;
+	int context_check_saved;
+#endif
+};
+
+/*
+ * CAREFUL: all accessors based on __ipipe_raw_cpu_ptr() you may find
+ * in this file should be used only while hw interrupts are off, to
+ * prevent from CPU migration regardless of the running domain.
+ */
+DECLARE_PER_CPU(struct ipipe_percpu_data, ipipe_percpu);
+
+static inline struct ipipe_percpu_domain_data *
+__context_of(struct ipipe_percpu_data *p, struct ipipe_domain *ipd)
+{
+	return (void *)p + ipd->context_offset;
+}
+
+/**
+ * ipipe_percpu_context - return the address of the pipeline context
+ * data for a domain on a given CPU.
+ *
+ * NOTE: this is the slowest accessor, use it carefully. Prefer
+ * ipipe_this_cpu_context() for requests targeted at the current
+ * CPU. Additionally, if the target domain is known at build time,
+ * consider ipipe_this_cpu_{root, head}_context().
+ */
+static inline struct ipipe_percpu_domain_data *
+ipipe_percpu_context(struct ipipe_domain *ipd, int cpu)
+{
+	return __context_of(&per_cpu(ipipe_percpu, cpu), ipd);
+}
+
+/**
+ * ipipe_this_cpu_context - return the address of the pipeline context
+ * data for a domain on the current CPU. hw IRQs must be off.
+ *
+ * NOTE: this accessor is a bit faster, but since we don't know which
+ * one of "root" or "head" ipd refers to, we still need to compute the
+ * context address from its offset.
+ */
+static inline struct ipipe_percpu_domain_data *
+ipipe_this_cpu_context(struct ipipe_domain *ipd)
+{
+	return __context_of(__ipipe_raw_cpu_ptr(&ipipe_percpu), ipd);
+}
+
+/**
+ * ipipe_this_cpu_root_context - return the address of the pipeline
+ * context data for the root domain on the current CPU. hw IRQs must
+ * be off.
+ *
+ * NOTE: this accessor is recommended when the domain we refer to is
+ * known at build time to be the root one.
+ */
+static inline struct ipipe_percpu_domain_data *
+ipipe_this_cpu_root_context(void)
+{
+	return __ipipe_raw_cpu_ptr(&ipipe_percpu.root);
+}
+
+/**
+ * ipipe_this_cpu_head_context - return the address of the pipeline
+ * context data for the registered head domain on the current CPU. hw
+ * IRQs must be off.
+ *
+ * NOTE: this accessor is recommended when the domain we refer to is
+ * known at build time to be the registered head domain. This address
+ * is always different from the context data of the root domain in
+ * absence of registered head domain. To get the address of the
+ * context data for the domain leading the pipeline at the time of the
+ * call (which may be root in absence of registered head domain), use
+ * ipipe_this_cpu_leading_context() instead.
+ */
+static inline struct ipipe_percpu_domain_data *
+ipipe_this_cpu_head_context(void)
+{
+	return __ipipe_raw_cpu_ptr(&ipipe_percpu.head);
+}
+
+/**
+ * ipipe_this_cpu_leading_context - return the address of the pipeline
+ * context data for the domain leading the pipeline on the current
+ * CPU. hw IRQs must be off.
+ *
+ * NOTE: this accessor is required when either root or a registered
+ * head domain may be the final target of this call, depending on
+ * whether the high priority domain was installed via
+ * ipipe_register_head().
+ */
+static inline struct ipipe_percpu_domain_data *
+ipipe_this_cpu_leading_context(void)
+{
+	return ipipe_this_cpu_context(ipipe_head_domain);
+}
+
+/**
+ * __ipipe_get_current_context() - return the address of the pipeline
+ * context data of the domain running on the current CPU. hw IRQs must
+ * be off.
+ */
+static inline struct ipipe_percpu_domain_data *__ipipe_get_current_context(void)
+{
+	return __ipipe_raw_cpu_read(ipipe_percpu.curr);
+}
+
+#define __ipipe_current_context __ipipe_get_current_context()
+
+/**
+ * __ipipe_set_current_context() - switch the current CPU to the
+ * specified domain context.  hw IRQs must be off.
+ *
+ * NOTE: this is the only way to change the current domain for the
+ * current CPU. Don't bypass.
+ */
+static inline
+void __ipipe_set_current_context(struct ipipe_percpu_domain_data *pd)
+{
+	struct ipipe_percpu_data *p;
+	p = __ipipe_raw_cpu_ptr(&ipipe_percpu);
+	p->curr = pd;
+}
+
+/**
+ * __ipipe_set_current_domain() - switch the current CPU to the
+ * specified domain. This is equivalent to calling
+ * __ipipe_set_current_context() with the context data of that
+ * domain. hw IRQs must be off.
+ */
+static inline void __ipipe_set_current_domain(struct ipipe_domain *ipd)
+{
+	struct ipipe_percpu_data *p;
+	p = __ipipe_raw_cpu_ptr(&ipipe_percpu);
+	p->curr = __context_of(p, ipd);
+}
+
+static inline struct ipipe_percpu_domain_data *ipipe_current_context(void)
+{
+	struct ipipe_percpu_domain_data *pd;
+	unsigned long flags;
+
+	flags = hard_smp_local_irq_save();
+	pd = __ipipe_get_current_context();
+	hard_smp_local_irq_restore(flags);
+
+	return pd;
+}
+
+static inline struct ipipe_domain *__ipipe_get_current_domain(void)
+{
+	return __ipipe_get_current_context()->domain;
+}
+
+#define __ipipe_current_domain	__ipipe_get_current_domain()
+
+/**
+ * __ipipe_get_current_domain() - return the address of the pipeline
+ * domain running on the current CPU. hw IRQs must be off.
+ */
+static inline struct ipipe_domain *ipipe_get_current_domain(void)
+{
+	struct ipipe_domain *ipd;
+	unsigned long flags;
+
+	flags = hard_smp_local_irq_save();
+	ipd = __ipipe_get_current_domain();
+	hard_smp_local_irq_restore(flags);
+
+	return ipd;
+}
+
+#define ipipe_current_domain	ipipe_get_current_domain()
+
+#define __ipipe_root_p	(__ipipe_current_domain == ipipe_root_domain)
+#define ipipe_root_p	(ipipe_current_domain == ipipe_root_domain)
+
+#ifdef CONFIG_SMP
+#define __ipipe_root_status	(ipipe_this_cpu_root_context()->status)
+#else
+extern unsigned long __ipipe_root_status;
+#endif
+
+#define __ipipe_head_status	(ipipe_this_cpu_head_context()->status)
+
+/**
+ * __ipipe_ipending_p() - Whether we have interrupts pending
+ * (i.e. logged) for the given domain context on the current CPU. hw
+ * IRQs must be off.
+ */
+static inline int __ipipe_ipending_p(struct ipipe_percpu_domain_data *pd)
+{
+	return pd->irqpend_0map != 0;
+}
+
+static inline unsigned long
+__ipipe_cpudata_irq_hits(struct ipipe_domain *ipd, int cpu, unsigned int irq)
+{
+	return ipipe_percpu_context(ipd, cpu)->irqall[irq];
+}
+
+#endif /* CONFIG_IPIPE */
+
+#endif	/* !__LINUX_IPIPE_DOMAIN_H */
diff --git a/include/linux/ipipe_lock.h b/include/linux/ipipe_lock.h
new file mode 100644
index 000000000000..da6188d45501
--- /dev/null
+++ b/include/linux/ipipe_lock.h
@@ -0,0 +1,329 @@
+/*   -*- linux-c -*-
+ *   include/linux/ipipe_lock.h
+ *
+ *   Copyright (C) 2009 Philippe Gerum.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ *   USA; either version 2 of the License, or (at your option) any later
+ *   version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *   GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef __LINUX_IPIPE_LOCK_H
+#define __LINUX_IPIPE_LOCK_H
+
+#include <asm-generic/ipipe.h>
+
+typedef struct {
+	arch_spinlock_t arch_lock;
+} __ipipe_spinlock_t;
+
+#define ipipe_spinlock(lock)	((__ipipe_spinlock_t *)(lock))
+#define ipipe_spinlock_p(lock)							\
+	__builtin_types_compatible_p(typeof(lock), __ipipe_spinlock_t *) ||	\
+	__builtin_types_compatible_p(typeof(lock), __ipipe_spinlock_t [])
+
+#define std_spinlock_raw(lock)	((raw_spinlock_t *)(lock))
+#define std_spinlock_raw_p(lock)					\
+	__builtin_types_compatible_p(typeof(lock), raw_spinlock_t *) ||	\
+	__builtin_types_compatible_p(typeof(lock), raw_spinlock_t [])
+
+#ifdef CONFIG_PREEMPT_RT_FULL
+
+#define PICK_SPINLOCK_IRQSAVE(lock, flags)				\
+	do {								\
+		if (ipipe_spinlock_p(lock))				\
+			(flags) = __ipipe_spin_lock_irqsave(ipipe_spinlock(lock)); \
+		else if (std_spinlock_raw_p(lock))				\
+			__real_raw_spin_lock_irqsave(std_spinlock_raw(lock), flags); \
+		else __bad_lock_type();					\
+	} while (0)
+
+#define PICK_SPINTRYLOCK_IRQSAVE(lock, flags)				\
+	({								\
+		int __ret__;						\
+		if (ipipe_spinlock_p(lock))				\
+			__ret__ = __ipipe_spin_trylock_irqsave(ipipe_spinlock(lock), &(flags)); \
+		else if (std_spinlock_raw_p(lock))				\
+			__ret__ = __real_raw_spin_trylock_irqsave(std_spinlock_raw(lock), flags); \
+		else __bad_lock_type();					\
+		__ret__;						\
+	 })
+
+#define PICK_SPINTRYLOCK_IRQ(lock)					\
+	({								\
+		int __ret__;						\
+		if (ipipe_spinlock_p(lock))				\
+			__ret__ = __ipipe_spin_trylock_irq(ipipe_spinlock(lock)); \
+		else if (std_spinlock_raw_p(lock))				\
+			__ret__ = __real_raw_spin_trylock_irq(std_spinlock_raw(lock)); \
+		else __bad_lock_type();					\
+		__ret__;						\
+	 })
+
+#define PICK_SPINUNLOCK_IRQRESTORE(lock, flags)				\
+	do {								\
+		if (ipipe_spinlock_p(lock))				\
+			__ipipe_spin_unlock_irqrestore(ipipe_spinlock(lock), flags); \
+		else if (std_spinlock_raw_p(lock)) {			\
+			__ipipe_spin_unlock_debug(flags);		\
+			__real_raw_spin_unlock_irqrestore(std_spinlock_raw(lock), flags); \
+		} else __bad_lock_type();				\
+	} while (0)
+
+#define PICK_SPINOP(op, lock)						\
+	({								\
+		if (ipipe_spinlock_p(lock))				\
+			arch_spin##op(&ipipe_spinlock(lock)->arch_lock); \
+		else if (std_spinlock_raw_p(lock))			\
+			__real_raw_spin##op(std_spinlock_raw(lock));	\
+		else __bad_lock_type();					\
+		(void)0;						\
+	})
+
+#define PICK_SPINOP_RET(op, lock, type)					\
+	({								\
+		type __ret__;						\
+		if (ipipe_spinlock_p(lock))				\
+			__ret__ = arch_spin##op(&ipipe_spinlock(lock)->arch_lock); \
+		else if (std_spinlock_raw_p(lock))			\
+			__ret__ = __real_raw_spin##op(std_spinlock_raw(lock)); \
+		else { __ret__ = -1; __bad_lock_type(); }		\
+		__ret__;						\
+	})
+
+#else /* !CONFIG_PREEMPT_RT_FULL */
+
+#define std_spinlock(lock)	((spinlock_t *)(lock))
+#define std_spinlock_p(lock)						\
+	__builtin_types_compatible_p(typeof(lock), spinlock_t *) ||	\
+	__builtin_types_compatible_p(typeof(lock), spinlock_t [])
+
+#define PICK_SPINLOCK_IRQSAVE(lock, flags)				\
+	do {								\
+		if (ipipe_spinlock_p(lock))				\
+			(flags) = __ipipe_spin_lock_irqsave(ipipe_spinlock(lock)); \
+		else if (std_spinlock_raw_p(lock))				\
+			__real_raw_spin_lock_irqsave(std_spinlock_raw(lock), flags); \
+		else if (std_spinlock_p(lock))				\
+			__real_raw_spin_lock_irqsave(&std_spinlock(lock)->rlock, flags); \
+		else __bad_lock_type();					\
+	} while (0)
+
+#define PICK_SPINTRYLOCK_IRQSAVE(lock, flags)				\
+	({								\
+		int __ret__;						\
+		if (ipipe_spinlock_p(lock))				\
+			__ret__ = __ipipe_spin_trylock_irqsave(ipipe_spinlock(lock), &(flags)); \
+		else if (std_spinlock_raw_p(lock))				\
+			__ret__ = __real_raw_spin_trylock_irqsave(std_spinlock_raw(lock), flags); \
+		else if (std_spinlock_p(lock))				\
+			__ret__ = __real_raw_spin_trylock_irqsave(&std_spinlock(lock)->rlock, flags); \
+		else __bad_lock_type();					\
+		__ret__;						\
+	 })
+
+#define PICK_SPINTRYLOCK_IRQ(lock)					\
+	({								\
+		int __ret__;						\
+		if (ipipe_spinlock_p(lock))				\
+			__ret__ = __ipipe_spin_trylock_irq(ipipe_spinlock(lock)); \
+		else if (std_spinlock_raw_p(lock))				\
+			__ret__ = __real_raw_spin_trylock_irq(std_spinlock_raw(lock)); \
+		else if (std_spinlock_p(lock))				\
+			__ret__ = __real_raw_spin_trylock_irq(&std_spinlock(lock)->rlock); \
+		else __bad_lock_type();					\
+		__ret__;						\
+	 })
+
+#define PICK_SPINUNLOCK_IRQRESTORE(lock, flags)				\
+	do {								\
+		if (ipipe_spinlock_p(lock))				\
+			__ipipe_spin_unlock_irqrestore(ipipe_spinlock(lock), flags); \
+		else {							\
+			__ipipe_spin_unlock_debug(flags);		\
+			if (std_spinlock_raw_p(lock))			\
+				__real_raw_spin_unlock_irqrestore(std_spinlock_raw(lock), flags); \
+			else if (std_spinlock_p(lock))			\
+				__real_raw_spin_unlock_irqrestore(&std_spinlock(lock)->rlock, flags); \
+		}							\
+	} while (0)
+
+#define PICK_SPINOP(op, lock)						\
+	({								\
+		if (ipipe_spinlock_p(lock))				\
+			arch_spin##op(&ipipe_spinlock(lock)->arch_lock); \
+		else if (std_spinlock_raw_p(lock))			\
+			__real_raw_spin##op(std_spinlock_raw(lock));	\
+		else if (std_spinlock_p(lock))				\
+			__real_raw_spin##op(&std_spinlock(lock)->rlock); \
+		else __bad_lock_type();					\
+		(void)0;						\
+	})
+
+#define PICK_SPINOP_RET(op, lock, type)					\
+	({								\
+		type __ret__;						\
+		if (ipipe_spinlock_p(lock))				\
+			__ret__ = arch_spin##op(&ipipe_spinlock(lock)->arch_lock); \
+		else if (std_spinlock_raw_p(lock))			\
+			__ret__ = __real_raw_spin##op(std_spinlock_raw(lock)); \
+		else if (std_spinlock_p(lock))				\
+			__ret__ = __real_raw_spin##op(&std_spinlock(lock)->rlock); \
+		else { __ret__ = -1; __bad_lock_type(); }		\
+		__ret__;						\
+	})
+
+#endif /* !CONFIG_PREEMPT_RT_FULL */
+
+#define arch_spin_lock_init(lock)					\
+	do {								\
+		IPIPE_DEFINE_SPINLOCK(__lock__);			\
+		*((ipipe_spinlock_t *)lock) = __lock__;			\
+	} while (0)
+
+#define arch_spin_lock_irq(lock)					\
+	do {								\
+		hard_local_irq_disable();				\
+		arch_spin_lock(lock);					\
+	} while (0)
+
+#define arch_spin_unlock_irq(lock)					\
+	do {								\
+		arch_spin_unlock(lock);					\
+		hard_local_irq_enable();				\
+	} while (0)
+
+typedef struct {
+	arch_rwlock_t arch_lock;
+} __ipipe_rwlock_t;
+
+#define ipipe_rwlock_p(lock)						\
+	__builtin_types_compatible_p(typeof(lock), __ipipe_rwlock_t *)
+
+#define std_rwlock_p(lock)						\
+	__builtin_types_compatible_p(typeof(lock), rwlock_t *)
+
+#define ipipe_rwlock(lock)	((__ipipe_rwlock_t *)(lock))
+#define std_rwlock(lock)	((rwlock_t *)(lock))
+
+#define PICK_RWOP(op, lock)						\
+	do {								\
+		if (ipipe_rwlock_p(lock))				\
+			arch##op(&ipipe_rwlock(lock)->arch_lock);	\
+		else if (std_rwlock_p(lock))				\
+			_raw##op(std_rwlock(lock));			\
+		else __bad_lock_type();					\
+	} while (0)
+
+extern int __bad_lock_type(void);
+
+#ifdef CONFIG_IPIPE
+
+#define ipipe_spinlock_t		__ipipe_spinlock_t
+#define IPIPE_DEFINE_RAW_SPINLOCK(x)	ipipe_spinlock_t x = IPIPE_SPIN_LOCK_UNLOCKED
+#define IPIPE_DECLARE_RAW_SPINLOCK(x)	extern ipipe_spinlock_t x
+#define IPIPE_DEFINE_SPINLOCK(x)	IPIPE_DEFINE_RAW_SPINLOCK(x)
+#define IPIPE_DECLARE_SPINLOCK(x)	IPIPE_DECLARE_RAW_SPINLOCK(x)
+
+#define IPIPE_SPIN_LOCK_UNLOCKED					\
+	(__ipipe_spinlock_t) {	.arch_lock = __ARCH_SPIN_LOCK_UNLOCKED }
+
+#define spin_lock_irqsave_cond(lock, flags) \
+	spin_lock_irqsave(lock, flags)
+
+#define spin_unlock_irqrestore_cond(lock, flags) \
+	spin_unlock_irqrestore(lock, flags)
+
+#define raw_spin_lock_irqsave_cond(lock, flags) \
+	raw_spin_lock_irqsave(lock, flags)
+
+#define raw_spin_unlock_irqrestore_cond(lock, flags) \
+	raw_spin_unlock_irqrestore(lock, flags)
+
+void __ipipe_spin_lock_irq(ipipe_spinlock_t *lock);
+
+int __ipipe_spin_trylock_irq(ipipe_spinlock_t *lock);
+
+void __ipipe_spin_unlock_irq(ipipe_spinlock_t *lock);
+
+unsigned long __ipipe_spin_lock_irqsave(ipipe_spinlock_t *lock);
+
+int __ipipe_spin_trylock_irqsave(ipipe_spinlock_t *lock,
+				 unsigned long *x);
+
+void __ipipe_spin_unlock_irqrestore(ipipe_spinlock_t *lock,
+				    unsigned long x);
+
+void __ipipe_spin_unlock_irqbegin(ipipe_spinlock_t *lock);
+
+void __ipipe_spin_unlock_irqcomplete(unsigned long x);
+
+#if defined(CONFIG_IPIPE_DEBUG_INTERNAL) && defined(CONFIG_SMP)
+void __ipipe_spin_unlock_debug(unsigned long flags);
+#else
+#define __ipipe_spin_unlock_debug(flags)  do { } while (0)
+#endif
+
+#define ipipe_rwlock_t			__ipipe_rwlock_t
+#define IPIPE_DEFINE_RWLOCK(x)		ipipe_rwlock_t x = IPIPE_RW_LOCK_UNLOCKED
+#define IPIPE_DECLARE_RWLOCK(x)		extern ipipe_rwlock_t x
+
+#define IPIPE_RW_LOCK_UNLOCKED	\
+	(__ipipe_rwlock_t) { .arch_lock = __ARCH_RW_LOCK_UNLOCKED }
+
+#else /* !CONFIG_IPIPE */
+
+#define ipipe_spinlock_t		spinlock_t
+#define IPIPE_DEFINE_SPINLOCK(x)	DEFINE_SPINLOCK(x)
+#define IPIPE_DECLARE_SPINLOCK(x)	extern spinlock_t x
+#define IPIPE_SPIN_LOCK_UNLOCKED	__SPIN_LOCK_UNLOCKED(unknown)
+#define IPIPE_DEFINE_RAW_SPINLOCK(x)	DEFINE_RAW_SPINLOCK(x)
+#define IPIPE_DECLARE_RAW_SPINLOCK(x)	extern raw_spinlock_t x
+
+#define spin_lock_irqsave_cond(lock, flags)		\
+	do {						\
+		(void)(flags);				\
+		spin_lock(lock);			\
+	} while(0)
+
+#define spin_unlock_irqrestore_cond(lock, flags)	\
+	spin_unlock(lock)
+
+#define raw_spin_lock_irqsave_cond(lock, flags) \
+	do {					\
+		(void)(flags);			\
+		raw_spin_lock(lock);		\
+	} while(0)
+
+#define raw_spin_unlock_irqrestore_cond(lock, flags) \
+	raw_spin_unlock(lock)
+
+#define __ipipe_spin_lock_irq(lock)		do { } while (0)
+#define __ipipe_spin_unlock_irq(lock)		do { } while (0)
+#define __ipipe_spin_lock_irqsave(lock)		0
+#define __ipipe_spin_trylock_irq(lock)		1
+#define __ipipe_spin_trylock_irqsave(lock, x)	({ (void)(x); 1; })
+#define __ipipe_spin_unlock_irqrestore(lock, x)	do { (void)(x); } while (0)
+#define __ipipe_spin_unlock_irqbegin(lock)	spin_unlock(lock)
+#define __ipipe_spin_unlock_irqcomplete(x)	do { (void)(x); } while (0)
+#define __ipipe_spin_unlock_debug(flags)	do { } while (0)
+
+#define ipipe_rwlock_t			rwlock_t
+#define IPIPE_DEFINE_RWLOCK(x)		DEFINE_RWLOCK(x)
+#define IPIPE_DECLARE_RWLOCK(x)		extern rwlock_t x
+#define IPIPE_RW_LOCK_UNLOCKED		RW_LOCK_UNLOCKED
+
+#endif /* !CONFIG_IPIPE */
+
+#endif /* !__LINUX_IPIPE_LOCK_H */
diff --git a/include/linux/ipipe_tickdev.h b/include/linux/ipipe_tickdev.h
new file mode 100644
index 000000000000..54d1e2daad6e
--- /dev/null
+++ b/include/linux/ipipe_tickdev.h
@@ -0,0 +1,167 @@
+/* -*- linux-c -*-
+ * include/linux/ipipe_tickdev.h
+ *
+ * Copyright (C) 2007 Philippe Gerum.
+ * Copyright (C) 2012 Gilles Chanteperdrix
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef __LINUX_IPIPE_TICKDEV_H
+#define __LINUX_IPIPE_TICKDEV_H
+
+#include <linux/list.h>
+#include <linux/cpumask.h>
+#include <linux/clockchips.h>
+#include <linux/ipipe_domain.h>
+#include <linux/clocksource.h>
+#include <linux/timekeeper_internal.h>
+
+#ifdef CONFIG_IPIPE
+
+struct clock_event_device;
+
+struct ipipe_hostrt_data {
+	short live;
+	seqcount_t seqcount;
+	time_t wall_time_sec;
+	u32 wall_time_nsec;
+	struct timespec wall_to_monotonic;
+	u64 cycle_last;
+	u64 mask;
+	u32 mult;
+	u32 shift;
+};
+
+enum clock_event_mode {
+	CLOCK_EVT_MODE_PERIODIC,
+	CLOCK_EVT_MODE_ONESHOT,
+	CLOCK_EVT_MODE_UNUSED,
+	CLOCK_EVT_MODE_SHUTDOWN,
+};
+
+struct ipipe_timer {
+	int irq;
+	void (*request)(struct ipipe_timer *timer, int steal);
+	int (*set)(unsigned long ticks, void *timer);
+	void (*ack)(void);
+	void (*release)(struct ipipe_timer *timer);
+
+	/* Only if registering a timer directly */
+	const char *name;
+	unsigned rating;
+	unsigned long freq;
+	unsigned long min_delay_ticks;
+	unsigned long max_delay_ticks;
+	const struct cpumask *cpumask;
+
+	/* For internal use */
+	void *timer_set;	/* pointer passed to ->set() callback */
+	struct clock_event_device *host_timer;
+	struct list_head link;
+
+	/* Conversions between clock frequency and timer frequency */
+	unsigned c2t_integ;
+	unsigned c2t_frac;
+
+	/* For clockevent interception */
+	u32 real_mult;
+	u32 real_shift;
+	void (*mode_handler)(enum clock_event_mode mode,
+			     struct clock_event_device *);
+	int orig_mode;
+	int (*orig_set_state_periodic)(struct clock_event_device *);
+	int (*orig_set_state_oneshot)(struct clock_event_device *);
+	int (*orig_set_state_oneshot_stopped)(struct clock_event_device *);
+	int (*orig_set_state_shutdown)(struct clock_event_device *);
+	int (*orig_set_next_event)(unsigned long evt,
+				   struct clock_event_device *cdev);
+	unsigned int (*refresh_freq)(void);
+};
+
+#define __ipipe_hrtimer_irq __ipipe_raw_cpu_read(ipipe_percpu.hrtimer_irq)
+
+extern unsigned long __ipipe_hrtimer_freq;
+
+/*
+ * Called by clockevents_register_device, to register a piggybacked
+ * ipipe timer, if there is one
+ */
+void ipipe_host_timer_register(struct clock_event_device *clkevt);
+
+/*
+ * Called by tick_cleanup_dead_cpu, to drop per-CPU timer devices
+ */
+void ipipe_host_timer_cleanup(struct clock_event_device *clkevt);
+
+/*
+ * Register a standalone ipipe timer
+ */
+void ipipe_timer_register(struct ipipe_timer *timer);
+
+/*
+ * Chooses the best timer for each cpu. Take over its handling.
+ */
+int ipipe_select_timers(const struct cpumask *mask);
+
+/*
+ * Release the per-cpu timers
+ */
+void ipipe_timers_release(void);
+
+/*
+ * Start handling the per-cpu timer irq, and intercepting the linux clockevent
+ * device callbacks.
+ */
+int ipipe_timer_start(void (*tick_handler)(void),
+		      void (*emumode)(enum clock_event_mode mode,
+				      struct clock_event_device *cdev),
+		      int (*emutick)(unsigned long evt,
+				     struct clock_event_device *cdev),
+		      unsigned cpu);
+
+/*
+ * Stop handling a per-cpu timer
+ */
+void ipipe_timer_stop(unsigned cpu);
+
+/*
+ * Program the timer
+ */
+void ipipe_timer_set(unsigned long delay);
+
+const char *ipipe_timer_name(void);
+
+unsigned ipipe_timer_ns2ticks(struct ipipe_timer *timer, unsigned ns);
+
+void __ipipe_timer_refresh_freq(unsigned int hrclock_freq);
+
+#else /* !CONFIG_IPIPE */
+
+#define ipipe_host_timer_register(clkevt) do { } while (0)
+
+#define ipipe_host_timer_cleanup(clkevt) do { } while (0)
+
+#endif /* !CONFIG_IPIPE */
+
+#ifdef CONFIG_IPIPE_HAVE_HOSTRT
+void ipipe_update_hostrt(struct timekeeper *tk);
+#else
+static inline void
+ipipe_update_hostrt(struct timekeeper *tk) {}
+#endif
+
+#endif /* __LINUX_IPIPE_TICKDEV_H */
diff --git a/include/linux/ipipe_trace.h b/include/linux/ipipe_trace.h
new file mode 100644
index 000000000000..7d0c867a360b
--- /dev/null
+++ b/include/linux/ipipe_trace.h
@@ -0,0 +1,78 @@
+/* -*- linux-c -*-
+ * include/linux/ipipe_trace.h
+ *
+ * Copyright (C) 2005 Luotao Fu.
+ *               2005-2007 Jan Kiszka.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#ifndef _LINUX_IPIPE_TRACE_H
+#define _LINUX_IPIPE_TRACE_H
+
+#ifdef CONFIG_IPIPE_TRACE
+
+#include <linux/types.h>
+
+struct pt_regs;
+
+void ipipe_trace_begin(unsigned long v);
+void ipipe_trace_end(unsigned long v);
+void ipipe_trace_freeze(unsigned long v);
+void ipipe_trace_special(unsigned char special_id, unsigned long v);
+void ipipe_trace_pid(pid_t pid, short prio);
+void ipipe_trace_event(unsigned char id, unsigned long delay_tsc);
+int ipipe_trace_max_reset(void);
+int ipipe_trace_frozen_reset(void);
+void ipipe_trace_irqbegin(int irq, struct pt_regs *regs);
+void ipipe_trace_irqend(int irq, struct pt_regs *regs);
+
+#else /* !CONFIG_IPIPE_TRACE */
+
+#define ipipe_trace_begin(v)			do { (void)(v); } while(0)
+#define ipipe_trace_end(v)			do { (void)(v); } while(0)
+#define ipipe_trace_freeze(v)			do { (void)(v); } while(0)
+#define ipipe_trace_special(id, v)		do { (void)(id); (void)(v); } while(0)
+#define ipipe_trace_pid(pid, prio)		do { (void)(pid); (void)(prio); } while(0)
+#define ipipe_trace_event(id, delay_tsc)	do { (void)(id); (void)(delay_tsc); } while(0)
+#define ipipe_trace_max_reset()			({ 0; })
+#define ipipe_trace_frozen_reset()		({ 0; })
+#define ipipe_trace_irqbegin(irq, regs)		do { } while(0)
+#define ipipe_trace_irqend(irq, regs)		do { } while(0)
+
+#endif /* !CONFIG_IPIPE_TRACE */
+
+#ifdef CONFIG_IPIPE_TRACE_PANIC
+void ipipe_trace_panic_freeze(void);
+void ipipe_trace_panic_dump(void);
+#else
+static inline void ipipe_trace_panic_freeze(void) { }
+static inline void ipipe_trace_panic_dump(void) { }
+#endif
+
+#ifdef CONFIG_IPIPE_TRACE_IRQSOFF
+#define ipipe_trace_irq_entry(irq)	ipipe_trace_begin(irq)
+#define ipipe_trace_irq_exit(irq)	ipipe_trace_end(irq)
+#define ipipe_trace_irqsoff()		ipipe_trace_begin(0x80000000UL)
+#define ipipe_trace_irqson()		ipipe_trace_end(0x80000000UL)
+#else
+#define ipipe_trace_irq_entry(irq)	do { (void)(irq);} while(0)
+#define ipipe_trace_irq_exit(irq)	do { (void)(irq);} while(0)
+#define ipipe_trace_irqsoff()		do { } while(0)
+#define ipipe_trace_irqson()		do { } while(0)
+#endif
+
+#endif	/* !__LINUX_IPIPE_TRACE_H */
diff --git a/include/linux/irq.h b/include/linux/irq.h
index e9e69c511ea9..335aa3309463 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -498,6 +498,11 @@ struct irq_chip {
 
 	void		(*irq_bus_lock)(struct irq_data *data);
 	void		(*irq_bus_sync_unlock)(struct irq_data *data);
+#ifdef CONFIG_IPIPE
+	void		(*irq_move)(struct irq_data *data);
+	void		(*irq_hold)(struct irq_data *data);
+	void		(*irq_release)(struct irq_data *data);
+#endif /* CONFIG_IPIPE */
 
 	void		(*irq_cpu_online)(struct irq_data *data);
 	void		(*irq_cpu_offline)(struct irq_data *data);
@@ -542,6 +547,7 @@ struct irq_chip {
  * IRQCHIP_EOI_THREADED:	Chip requires eoi() on unmask in threaded mode
  * IRQCHIP_SUPPORTS_LEVEL_MSI	Chip can provide two doorbells for Level MSIs
  * IRQCHIP_SUPPORTS_NMI:	Chip can deliver NMIs, only for root irqchips
+ * IRQCHIP_PIPELINE_SAFE:	Chip can work in pipelined mode
  */
 enum {
 	IRQCHIP_SET_TYPE_MASKED		= (1 <<  0),
@@ -553,6 +559,7 @@ enum {
 	IRQCHIP_EOI_THREADED		= (1 <<  6),
 	IRQCHIP_SUPPORTS_LEVEL_MSI	= (1 <<  7),
 	IRQCHIP_SUPPORTS_NMI		= (1 <<  8),
+	IRQCHIP_PIPELINE_SAFE           = (1 <<  9),
 };
 
 #include <linux/irqdesc.h>
@@ -649,6 +656,11 @@ extern void irq_chip_mask_parent(struct irq_data *data);
 extern void irq_chip_mask_ack_parent(struct irq_data *data);
 extern void irq_chip_unmask_parent(struct irq_data *data);
 extern void irq_chip_eoi_parent(struct irq_data *data);
+#ifdef CONFIG_IPIPE
+extern void irq_chip_hold_parent(struct irq_data *data);
+extern void irq_chip_release_parent(struct irq_data *data);
+#endif
+
 extern int irq_chip_set_affinity_parent(struct irq_data *data,
 					const struct cpumask *dest,
 					bool force);
@@ -775,7 +787,14 @@ extern int irq_set_irq_type(unsigned int irq, unsigned int type);
 extern int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry);
 extern int irq_set_msi_desc_off(unsigned int irq_base, unsigned int irq_offset,
 				struct msi_desc *entry);
-extern struct irq_data *irq_get_irq_data(unsigned int irq);
+
+static inline __attribute__((const)) struct irq_data *
+irq_get_irq_data(unsigned int irq)
+{
+	struct irq_desc *desc = irq_to_desc(irq);
+
+	return desc ? &desc->irq_data : NULL;
+}
 
 static inline struct irq_chip *irq_get_chip(unsigned int irq)
 {
@@ -1018,7 +1037,11 @@ struct irq_chip_type {
  * different flow mechanisms (level/edge) for it.
  */
 struct irq_chip_generic {
+#ifdef CONFIG_IPIPE
+	ipipe_spinlock_t	lock;
+#else
 	raw_spinlock_t		lock;
+#endif
 	void __iomem		*reg_base;
 	u32			(*reg_readl)(void __iomem *addr);
 	void			(*reg_writel)(u32 val, void __iomem *addr);
@@ -1146,18 +1169,28 @@ static inline struct irq_chip_type *irq_data_get_chip_type(struct irq_data *d)
 #define IRQ_MSK(n) (u32)((n) < 32 ? ((1 << (n)) - 1) : UINT_MAX)
 
 #ifdef CONFIG_SMP
-static inline void irq_gc_lock(struct irq_chip_generic *gc)
+static inline unsigned long irq_gc_lock(struct irq_chip_generic *gc)
 {
-	raw_spin_lock(&gc->lock);
+	unsigned long flags = 0;
+	raw_spin_lock_irqsave_cond(&gc->lock, flags);
+	return flags;
 }
 
-static inline void irq_gc_unlock(struct irq_chip_generic *gc)
+static inline void
+irq_gc_unlock(struct irq_chip_generic *gc, unsigned long flags)
 {
-	raw_spin_unlock(&gc->lock);
+	raw_spin_unlock_irqrestore_cond(&gc->lock, flags);
 }
 #else
-static inline void irq_gc_lock(struct irq_chip_generic *gc) { }
-static inline void irq_gc_unlock(struct irq_chip_generic *gc) { }
+static inline unsigned long irq_gc_lock(struct irq_chip_generic *gc)
+{
+	return hard_cond_local_irq_save();
+}
+static inline void
+irq_gc_unlock(struct irq_chip_generic *gc, unsigned long flags)
+{
+	hard_cond_local_irq_restore(flags);
+}
 #endif
 
 /*
diff --git a/include/linux/irqchip/arm-gic-common.h b/include/linux/irqchip/arm-gic-common.h
index b9850f5f1906..0a588a04ccc1 100644
--- a/include/linux/irqchip/arm-gic-common.h
+++ b/include/linux/irqchip/arm-gic-common.h
@@ -10,7 +10,12 @@
 #include <linux/types.h>
 #include <linux/ioport.h>
 
+#ifndef CONFIG_IPIPE
 #define GICD_INT_DEF_PRI		0xa0
+#else
+#define GICD_INT_DEF_PRI		0x10
+#endif
+
 #define GICD_INT_DEF_PRI_X4		((GICD_INT_DEF_PRI << 24) |\
 					(GICD_INT_DEF_PRI << 16) |\
 					(GICD_INT_DEF_PRI << 8) |\
diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index d6e2ab538ef2..089ccb387762 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -57,6 +57,10 @@ struct irq_desc {
 	struct irq_common_data	irq_common_data;
 	struct irq_data		irq_data;
 	unsigned int __percpu	*kstat_irqs;
+#ifdef CONFIG_IPIPE
+	void			(*ipipe_ack)(struct irq_desc *desc);
+	void			(*ipipe_end)(struct irq_desc *desc);
+#endif /* CONFIG_IPIPE */
 	irq_flow_handler_t	handle_irq;
 #ifdef CONFIG_IRQ_PREFLOW_FASTEOI
 	irq_preflow_handler_t	preflow_handler;
@@ -186,6 +190,10 @@ static inline int irq_desc_has_action(struct irq_desc *desc)
 	return desc->action != NULL;
 }
 
+irq_flow_handler_t
+__ipipe_setup_irq_desc(struct irq_desc *desc, irq_flow_handler_t handle,
+		int is_chained);
+
 static inline int irq_has_action(unsigned int irq)
 {
 	return irq_desc_has_action(irq_to_desc(irq));
@@ -206,7 +214,7 @@ static inline void irq_set_handler_locked(struct irq_data *data,
 {
 	struct irq_desc *desc = irq_data_to_desc(data);
 
-	desc->handle_irq = handler;
+	desc->handle_irq = __ipipe_setup_irq_desc(desc, handler, 0);
 }
 
 /**
@@ -227,7 +235,7 @@ irq_set_chip_handler_name_locked(struct irq_data *data, struct irq_chip *chip,
 {
 	struct irq_desc *desc = irq_data_to_desc(data);
 
-	desc->handle_irq = handler;
+	desc->handle_irq = __ipipe_setup_irq_desc(desc, handler, 0);
 	desc->name = name;
 	data->chip = chip;
 }
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..d640c584c3d8 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -148,6 +148,18 @@ do {						\
 
 #endif /* CONFIG_TRACE_IRQFLAGS */
 
+#ifdef CONFIG_IPIPE
+#define local_irq_enable_full()		local_irq_enable()
+#define local_irq_disable_full()		\
+	do {					\
+		local_irq_disable();		\
+		hard_local_irq_disable();	\
+	} while (0)
+#else
+#define local_irq_enable_full()		local_irq_enable()
+#define local_irq_disable_full()	local_irq_disable()
+#endif
+
 #define local_save_flags(flags)	raw_local_save_flags(flags)
 
 /*
diff --git a/include/linux/irqnr.h b/include/linux/irqnr.h
index 3496baa0b07f..c731f1874042 100644
--- a/include/linux/irqnr.h
+++ b/include/linux/irqnr.h
@@ -6,7 +6,11 @@
 
 
 extern int nr_irqs;
+#if !defined(CONFIG_IPIPE) || defined(CONFIG_SPARSE_IRQ)
 extern struct irq_desc *irq_to_desc(unsigned int irq);
+#else
+#define irq_to_desc(irq)	({ ipipe_virtual_irq_p(irq) ? NULL : &irq_desc[irq]; })
+#endif
 unsigned int irq_get_next_irq(unsigned int offset);
 
 # define for_each_irq_desc(irq, desc)					\
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index d83d403dac2e..4d237ad48fc1 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -15,6 +15,7 @@
 #include <linux/printk.h>
 #include <linux/build_bug.h>
 #include <asm/byteorder.h>
+#include <asm-generic/ipipe.h>
 #include <asm/div64.h>
 #include <uapi/linux/kernel.h>
 #include <asm/div64.h>
@@ -203,9 +204,12 @@ struct user;
 
 #ifdef CONFIG_PREEMPT_VOLUNTARY
 extern int _cond_resched(void);
-# define might_resched() _cond_resched()
+# define might_resched() do { \
+		ipipe_root_only(); \
+		_cond_resched(); \
+	} while (0)
 #else
-# define might_resched() do { } while (0)
+# define might_resched() ipipe_root_only()
 #endif
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 21aa6d736e99..5a7d7d8e1eb2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -265,6 +265,10 @@ struct kvm_vcpu {
 #ifdef CONFIG_PREEMPT_NOTIFIERS
 	struct preempt_notifier preempt_notifier;
 #endif
+#ifdef CONFIG_IPIPE
+	struct ipipe_vm_notifier ipipe_notifier;
+	bool ipipe_put_vcpu;
+#endif
 	int cpu;
 	int vcpu_id;
 	int srcu_idx;
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index bbb68dba37cc..ee3c9f08c57e 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -252,7 +252,28 @@ do { \
 
 #endif /* CONFIG_PREEMPT_COUNT */
 
-#ifdef MODULE
+#ifdef CONFIG_IPIPE
+#define hard_preempt_disable()				\
+	({						\
+		unsigned long __flags__;		\
+		__flags__ = hard_local_irq_save();	\
+		if (__ipipe_root_p)			\
+			preempt_disable();		\
+		__flags__;				\
+	})
+
+#define hard_preempt_enable(__flags__)					\
+	do {								\
+		if (__ipipe_root_p) {					\
+			preempt_enable_no_resched();			\
+			hard_local_irq_restore(__flags__);		\
+			if (!hard_irqs_disabled_flags(__flags__))	\
+				preempt_check_resched();		\
+		} else							\
+			hard_local_irq_restore(__flags__);		\
+	} while (0)
+
+#elif defined(MODULE)
 /*
  * Modules have no business playing preemption tricks.
  */
@@ -260,7 +281,7 @@ do { \
 #undef preempt_enable_no_resched
 #undef preempt_enable_no_resched_notrace
 #undef preempt_check_resched
-#endif
+#endif	/* !IPIPE && MODULE */
 
 #define preempt_set_need_resched() \
 do { \
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 3b5cb66d8bc1..f5fca32df941 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -158,6 +158,17 @@ static inline void printk_nmi_direct_enter(void) { }
 static inline void printk_nmi_direct_exit(void) { }
 #endif /* PRINTK_NMI */
 
+#ifdef CONFIG_RAW_PRINTK
+void raw_vprintk(const char *fmt, va_list ap);
+asmlinkage __printf(1, 2)
+void raw_printk(const char *fmt, ...);
+#else
+static inline __cold
+void raw_vprintk(const char *s, va_list ap) { }
+static inline __printf(1, 2) __cold
+void raw_printk(const char *s, ...) { }
+#endif
+
 #ifdef CONFIG_PRINTK
 asmlinkage __printf(5, 0)
 int vprintk_emit(int facility, int level,
diff --git a/include/linux/rwlock.h b/include/linux/rwlock.h
index 3dcd617e65ae..ac48e29bbef0 100644
--- a/include/linux/rwlock.h
+++ b/include/linux/rwlock.h
@@ -67,8 +67,8 @@ do {								\
 #define read_trylock(lock)	__cond_lock(lock, _raw_read_trylock(lock))
 #define write_trylock(lock)	__cond_lock(lock, _raw_write_trylock(lock))
 
-#define write_lock(lock)	_raw_write_lock(lock)
-#define read_lock(lock)		_raw_read_lock(lock)
+#define write_lock(lock)	PICK_RWOP(_write_lock, lock)
+#define read_lock(lock)		PICK_RWOP(_read_lock, lock)
 
 #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 
@@ -102,8 +102,8 @@ do {								\
 #define read_lock_bh(lock)		_raw_read_lock_bh(lock)
 #define write_lock_irq(lock)		_raw_write_lock_irq(lock)
 #define write_lock_bh(lock)		_raw_write_lock_bh(lock)
-#define read_unlock(lock)		_raw_read_unlock(lock)
-#define write_unlock(lock)		_raw_write_unlock(lock)
+#define read_unlock(lock)		PICK_RWOP(_read_unlock, lock)
+#define write_unlock(lock)		PICK_RWOP(_write_unlock, lock)
 #define read_unlock_irq(lock)		_raw_read_unlock_irq(lock)
 #define write_unlock_irq(lock)		_raw_write_unlock_irq(lock)
 
diff --git a/include/linux/rwlock_api_smp.h b/include/linux/rwlock_api_smp.h
index 86ebb4bf9c6e..c1ed96fa0726 100644
--- a/include/linux/rwlock_api_smp.h
+++ b/include/linux/rwlock_api_smp.h
@@ -141,7 +141,9 @@ static inline int __raw_write_trylock(rwlock_t *lock)
  * even on CONFIG_PREEMPT, because lockdep assumes that interrupts are
  * not re-enabled during lock-acquire (which the preempt-spin-ops do):
  */
-#if !defined(CONFIG_GENERIC_LOCKBREAK) || defined(CONFIG_DEBUG_LOCK_ALLOC)
+#if !defined(CONFIG_GENERIC_LOCKBREAK) ||	\
+	defined(CONFIG_DEBUG_LOCK_ALLOC) ||	\
+	defined(CONFIG_IPIPE)
 
 static inline void __raw_read_lock(rwlock_t *lock)
 {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5710b80f8050..b49275aec3dd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -89,7 +89,9 @@ struct task_group;
 #define TASK_WAKING			0x0200
 #define TASK_NOLOAD			0x0400
 #define TASK_NEW			0x0800
-#define TASK_STATE_MAX			0x1000
+#define TASK_HARDENING			0x1000
+#define TASK_NOWAKEUP			0x2000
+#define TASK_STATE_MAX			0x4000
 
 /* Convenience macros for the sake of set_current_state: */
 #define TASK_KILLABLE			(TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index dfd82eab2902..5fb7c7e364fb 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -74,6 +74,7 @@ static inline int get_dumpable(struct mm_struct *mm)
 #define MMF_OOM_REAP_QUEUED	26	/* mm was queued for oom_reaper */
 #define MMF_MULTIPROCESS	27	/* mm is shared between processes */
 #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
+#define MMF_VM_PINNED		31	/* ondemand load up and COW disabled */
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
 				 MMF_DISABLE_THP_MASK)
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 031ce8617df8..0732eca3b6d4 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -91,10 +91,12 @@
 # include <linux/spinlock_up.h>
 #endif
 
+#include <linux/ipipe_lock.h>
+
 #ifdef CONFIG_DEBUG_SPINLOCK
   extern void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
 				   struct lock_class_key *key);
-# define raw_spin_lock_init(lock)				\
+# define __real_raw_spin_lock_init(lock)			\
 do {								\
 	static struct lock_class_key __key;			\
 								\
@@ -102,11 +104,14 @@ do {								\
 } while (0)
 
 #else
-# define raw_spin_lock_init(lock)				\
+# define __real_raw_spin_lock_init(lock)			\
 	do { *(lock) = __RAW_SPIN_LOCK_UNLOCKED(lock); } while (0)
 #endif
+#define raw_spin_lock_init(lock)	PICK_SPINOP(_lock_init, lock)
 
-#define raw_spin_is_locked(lock)	arch_spin_is_locked(&(lock)->raw_lock)
+#define __real_raw_spin_is_locked(lock)				\
+	arch_spin_is_locked(&(lock)->raw_lock)
+#define raw_spin_is_locked(lock)	PICK_SPINOP_RET(_is_locked, lock, int)
 
 #ifdef arch_spin_is_contended
 #define raw_spin_is_contended(lock)	arch_spin_is_contended(&(lock)->raw_lock)
@@ -218,9 +223,11 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
  * various methods are defined as nops in the case they are not
  * required.
  */
-#define raw_spin_trylock(lock)	__cond_lock(lock, _raw_spin_trylock(lock))
+#define __real_raw_spin_trylock(lock)	__cond_lock(lock, _raw_spin_trylock(lock))
+#define raw_spin_trylock(lock)		PICK_SPINOP_RET(_trylock, lock, int)
 
-#define raw_spin_lock(lock)	_raw_spin_lock(lock)
+#define __real_raw_spin_lock(lock)	_raw_spin_lock(lock)
+#define raw_spin_lock(lock)		PICK_SPINOP(_lock, lock)
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 # define raw_spin_lock_nested(lock, subclass) \
@@ -244,7 +251,7 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
 
 #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 
-#define raw_spin_lock_irqsave(lock, flags)			\
+#define __real_raw_spin_lock_irqsave(lock, flags)	\
 	do {						\
 		typecheck(unsigned long, flags);	\
 		flags = _raw_spin_lock_irqsave(lock);	\
@@ -266,7 +273,7 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
 
 #else
 
-#define raw_spin_lock_irqsave(lock, flags)		\
+#define __real_raw_spin_lock_irqsave(lock, flags)	\
 	do {						\
 		typecheck(unsigned long, flags);	\
 		_raw_spin_lock_irqsave(lock, flags);	\
@@ -277,34 +284,46 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
 
 #endif
 
-#define raw_spin_lock_irq(lock)		_raw_spin_lock_irq(lock)
+#define raw_spin_lock_irqsave(lock, flags)  \
+	PICK_SPINLOCK_IRQSAVE(lock, flags)
+
+#define __real_raw_spin_lock_irq(lock)	_raw_spin_lock_irq(lock)
+#define raw_spin_lock_irq(lock)		PICK_SPINOP(_lock_irq, lock)
 #define raw_spin_lock_bh(lock)		_raw_spin_lock_bh(lock)
-#define raw_spin_unlock(lock)		_raw_spin_unlock(lock)
-#define raw_spin_unlock_irq(lock)	_raw_spin_unlock_irq(lock)
+#define __real_raw_spin_unlock(lock)	_raw_spin_unlock(lock)
+#define raw_spin_unlock(lock)		PICK_SPINOP(_unlock, lock)
+#define __real_raw_spin_unlock_irq(lock) _raw_spin_unlock_irq(lock)
+#define raw_spin_unlock_irq(lock)	PICK_SPINOP(_unlock_irq, lock)
 
-#define raw_spin_unlock_irqrestore(lock, flags)		\
+#define __real_raw_spin_unlock_irqrestore(lock, flags)		\
 	do {							\
 		typecheck(unsigned long, flags);		\
 		_raw_spin_unlock_irqrestore(lock, flags);	\
 	} while (0)
+#define raw_spin_unlock_irqrestore(lock, flags)	\
+	PICK_SPINUNLOCK_IRQRESTORE(lock, flags)
+
 #define raw_spin_unlock_bh(lock)	_raw_spin_unlock_bh(lock)
 
 #define raw_spin_trylock_bh(lock) \
 	__cond_lock(lock, _raw_spin_trylock_bh(lock))
 
-#define raw_spin_trylock_irq(lock) \
+#define __real_raw_spin_trylock_irq(lock) \
 ({ \
 	local_irq_disable(); \
-	raw_spin_trylock(lock) ? \
+	__real_raw_spin_trylock(lock) ? \
 	1 : ({ local_irq_enable(); 0;  }); \
 })
+#define raw_spin_trylock_irq(lock)	PICK_SPINTRYLOCK_IRQ(lock)
 
-#define raw_spin_trylock_irqsave(lock, flags) \
+#define __real_raw_spin_trylock_irqsave(lock, flags) \
 ({ \
 	local_irq_save(flags); \
 	raw_spin_trylock(lock) ? \
 	1 : ({ local_irq_restore(flags); 0; }); \
 })
+#define raw_spin_trylock_irqsave(lock, flags)	\
+	PICK_SPINTRYLOCK_IRQSAVE(lock, flags)
 
 /* Include rwlock functions */
 #include <linux/rwlock.h>
@@ -329,24 +348,17 @@ static __always_inline raw_spinlock_t *spinlock_check(spinlock_t *lock)
 
 #define spin_lock_init(_lock)				\
 do {							\
-	spinlock_check(_lock);				\
-	raw_spin_lock_init(&(_lock)->rlock);		\
+	raw_spin_lock_init(_lock);			\
 } while (0)
 
-static __always_inline void spin_lock(spinlock_t *lock)
-{
-	raw_spin_lock(&lock->rlock);
-}
+#define spin_lock(lock)		raw_spin_lock(lock)
 
 static __always_inline void spin_lock_bh(spinlock_t *lock)
 {
 	raw_spin_lock_bh(&lock->rlock);
 }
 
-static __always_inline int spin_trylock(spinlock_t *lock)
-{
-	return raw_spin_trylock(&lock->rlock);
-}
+#define spin_trylock(lock)	raw_spin_trylock(lock)
 
 #define spin_lock_nested(lock, subclass)			\
 do {								\
@@ -358,14 +370,11 @@ do {									\
 	raw_spin_lock_nest_lock(spinlock_check(lock), nest_lock);	\
 } while (0)
 
-static __always_inline void spin_lock_irq(spinlock_t *lock)
-{
-	raw_spin_lock_irq(&lock->rlock);
-}
+#define spin_lock_irq(lock)	raw_spin_lock_irq(lock)
 
 #define spin_lock_irqsave(lock, flags)				\
 do {								\
-	raw_spin_lock_irqsave(spinlock_check(lock), flags);	\
+	raw_spin_lock_irqsave(lock, flags);			\
 } while (0)
 
 #define spin_lock_irqsave_nested(lock, flags, subclass)			\
@@ -373,39 +382,28 @@ do {									\
 	raw_spin_lock_irqsave_nested(spinlock_check(lock), flags, subclass); \
 } while (0)
 
-static __always_inline void spin_unlock(spinlock_t *lock)
-{
-	raw_spin_unlock(&lock->rlock);
-}
+#define spin_unlock(lock)	raw_spin_unlock(lock)
 
 static __always_inline void spin_unlock_bh(spinlock_t *lock)
 {
 	raw_spin_unlock_bh(&lock->rlock);
 }
 
-static __always_inline void spin_unlock_irq(spinlock_t *lock)
-{
-	raw_spin_unlock_irq(&lock->rlock);
-}
+#define spin_unlock_irq(lock)	raw_spin_unlock_irq(lock)
 
-static __always_inline void spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
-{
-	raw_spin_unlock_irqrestore(&lock->rlock, flags);
-}
+#define spin_unlock_irqrestore(lock, flags)	\
+	raw_spin_unlock_irqrestore(lock, flags)
 
 static __always_inline int spin_trylock_bh(spinlock_t *lock)
 {
 	return raw_spin_trylock_bh(&lock->rlock);
 }
 
-static __always_inline int spin_trylock_irq(spinlock_t *lock)
-{
-	return raw_spin_trylock_irq(&lock->rlock);
-}
+#define spin_trylock_irq(lock)	raw_spin_trylock_irq(lock)
 
 #define spin_trylock_irqsave(lock, flags)			\
 ({								\
-	raw_spin_trylock_irqsave(spinlock_check(lock), flags); \
+	raw_spin_trylock_irqsave(lock, flags);			\
 })
 
 /**
diff --git a/include/linux/spinlock_api_smp.h b/include/linux/spinlock_api_smp.h
index b762eaba4cdf..5098b836e866 100644
--- a/include/linux/spinlock_api_smp.h
+++ b/include/linux/spinlock_api_smp.h
@@ -99,7 +99,9 @@ static inline int __raw_spin_trylock(raw_spinlock_t *lock)
  * even on CONFIG_PREEMPTION, because lockdep assumes that interrupts are
  * not re-enabled during lock-acquire (which the preempt-spin-ops do):
  */
-#if !defined(CONFIG_GENERIC_LOCKBREAK) || defined(CONFIG_DEBUG_LOCK_ALLOC)
+#if !defined(CONFIG_GENERIC_LOCKBREAK) ||	\
+	defined(CONFIG_DEBUG_LOCK_ALLOC) ||	\
+	defined(CONFIG_IPIPE)
 
 static inline unsigned long __raw_spin_lock_irqsave(raw_spinlock_t *lock)
 {
@@ -113,7 +115,7 @@ static inline unsigned long __raw_spin_lock_irqsave(raw_spinlock_t *lock)
 	 * do_raw_spin_lock_flags() code, because lockdep assumes
 	 * that interrupts are not re-enabled during lock-acquire:
 	 */
-#ifdef CONFIG_LOCKDEP
+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_IPIPE)
 	LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
 #else
 	do_raw_spin_lock_flags(lock, &flags);
diff --git a/include/linux/spinlock_up.h b/include/linux/spinlock_up.h
index 0ac9112c1bbe..b8c6c6d477d6 100644
--- a/include/linux/spinlock_up.h
+++ b/include/linux/spinlock_up.h
@@ -48,16 +48,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 	lock->slock = 1;
 }
 
-/*
- * Read-write spinlocks. No debug version.
- */
-#define arch_read_lock(lock)		do { barrier(); (void)(lock); } while (0)
-#define arch_write_lock(lock)		do { barrier(); (void)(lock); } while (0)
-#define arch_read_trylock(lock)	({ barrier(); (void)(lock); 1; })
-#define arch_write_trylock(lock)	({ barrier(); (void)(lock); 1; })
-#define arch_read_unlock(lock)		do { barrier(); (void)(lock); } while (0)
-#define arch_write_unlock(lock)	do { barrier(); (void)(lock); } while (0)
-
 #else /* DEBUG_SPINLOCK */
 #define arch_spin_is_locked(lock)	((void)(lock), 0)
 /* for sched/core.c and kernel_lock.c: */
@@ -67,6 +57,13 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 # define arch_spin_trylock(lock)	({ barrier(); (void)(lock); 1; })
 #endif /* DEBUG_SPINLOCK */
 
+#define arch_read_lock(lock)		do { barrier(); (void)(lock); } while (0)
+#define arch_write_lock(lock)		do { barrier(); (void)(lock); } while (0)
+#define arch_read_trylock(lock)	({ barrier(); (void)(lock); 1; })
+#define arch_write_trylock(lock)	({ barrier(); (void)(lock); 1; })
+#define arch_read_unlock(lock)		do { barrier(); (void)(lock); } while (0)
+#define arch_write_unlock(lock)	do { barrier(); (void)(lock); } while (0)
+
 #define arch_spin_is_contended(lock)	(((void)(lock), 0))
 
 #endif /* __LINUX_SPINLOCK_UP_H */
diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index f9a0c6189852..1695d3cfc154 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -139,13 +139,17 @@ int stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, void *data,
 				   const struct cpumask *cpus);
 #else	/* CONFIG_SMP || CONFIG_HOTPLUG_CPU */
 
+#include <linux/interrupt.h>
+
 static inline int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data,
 					  const struct cpumask *cpus)
 {
 	unsigned long flags;
 	int ret;
 	local_irq_save(flags);
+	hard_irq_disable();
 	ret = fn(data);
+	hard_irq_enable();
 	local_irq_restore(flags);
 	return ret;
 }
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 57ce5af258a3..ff8f2fe943f1 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -19,6 +19,7 @@
 #include <linux/cpumask.h>
 #include <linux/rcupdate.h>
 #include <linux/tracepoint-defs.h>
+#include <linux/ipipe.h>
 
 struct module;
 struct tracepoint;
@@ -210,7 +211,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 			__DO_TRACE(&__tracepoint_##name,		\
 				TP_PROTO(data_proto),			\
 				TP_ARGS(data_args),			\
-				TP_CONDITION(cond), 1);			\
+				TP_CONDITION(cond), ipipe_root_p);	\
 	}
 #else
 #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args)
diff --git a/init/Kconfig b/init/Kconfig
index 96fc45d1b686..e7c80ce6625a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1399,6 +1399,18 @@ config PRINTK_NMI
 	depends on PRINTK
 	depends on HAVE_NMI
 
+config RAW_PRINTK
+       bool "Enable support for raw printk"
+       default n
+       help
+         This option enables a printk variant called raw_printk() for
+         writing all output unmodified to a raw console channel
+         immediately, without any header or preparation whatsoever,
+         usable from any context.
+
+	 Unlike early_printk() console devices, raw_printk() devices
+         can live past the boot sequence.
+
 config BUG
 	bool "BUG() support" if EXPERT
 	default y
diff --git a/init/main.c b/init/main.c
index fef9e610b74b..594a510c4e9d 100644
--- a/init/main.c
+++ b/init/main.c
@@ -47,6 +47,7 @@
 #include <linux/cpuset.h>
 #include <linux/cgroup.h>
 #include <linux/efi.h>
+#include <linux/ipipe.h>
 #include <linux/tick.h>
 #include <linux/sched/isolation.h>
 #include <linux/interrupt.h>
@@ -585,7 +586,7 @@ asmlinkage __visible void __init start_kernel(void)
 
 	cgroup_init_early();
 
-	local_irq_disable();
+	hard_local_irq_disable();
 	early_boot_irqs_disabled = true;
 
 	/*
@@ -625,6 +626,7 @@ asmlinkage __visible void __init start_kernel(void)
 	setup_log_buf(0);
 	vfs_caches_init_early();
 	sort_main_extable();
+	__ipipe_init_early();
 	trap_init();
 	mm_init();
 
@@ -695,6 +697,11 @@ asmlinkage __visible void __init start_kernel(void)
 	boot_init_stack_canary();
 
 	time_init();
+	/*
+	* We need to wait for the interrupt and time subsystems to be
+	* initialized before enabling the pipeline.
+	*/
+	__ipipe_init();
 	perf_event_init();
 	profile_init();
 	call_function_init();
@@ -1029,6 +1036,7 @@ static void __init do_basic_setup(void)
 	cpuset_init_smp();
 	driver_init();
 	init_irq_proc();
+	__ipipe_init_proc();
 	do_ctors();
 	usermodehelper_enable();
 	do_initcalls();
diff --git a/kernel/Makefile b/kernel/Makefile
index f2cc0d118a0b..a4530baf7d4c 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -88,6 +88,7 @@ obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
 obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
+obj-$(CONFIG_IPIPE) += ipipe/
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index be01a4d627c9..6fc82f6d3c33 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -114,7 +114,7 @@ void context_tracking_enter(enum ctx_state state)
 	 * helpers are enough to protect RCU uses inside the exception. So
 	 * just return immediately if we detect we are in an IRQ.
 	 */
-	if (in_interrupt())
+	if (!ipipe_root_p || in_interrupt())
 		return;
 
 	local_irq_save(flags);
@@ -170,7 +170,7 @@ void context_tracking_exit(enum ctx_state state)
 {
 	unsigned long flags;
 
-	if (in_interrupt())
+	if (!ipipe_root_p || in_interrupt())
 		return;
 
 	local_irq_save(flags);
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 097ab02989f9..91e34081688c 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -112,8 +112,8 @@ static struct kgdb_bkpt		kgdb_break[KGDB_MAX_BREAKPOINTS] = {
  */
 atomic_t			kgdb_active = ATOMIC_INIT(-1);
 EXPORT_SYMBOL_GPL(kgdb_active);
-static DEFINE_RAW_SPINLOCK(dbg_master_lock);
-static DEFINE_RAW_SPINLOCK(dbg_slave_lock);
+static IPIPE_DEFINE_RAW_SPINLOCK(dbg_master_lock);
+static IPIPE_DEFINE_RAW_SPINLOCK(dbg_slave_lock);
 
 /*
  * We use NR_CPUs not PERCPU, in case kgdb is used to debug early
@@ -511,7 +511,9 @@ static int kgdb_reenter_check(struct kgdb_state *ks)
 static void dbg_touch_watchdogs(void)
 {
 	touch_softlockup_watchdog_sync();
+#ifndef CONFIG_IPIPE
 	clocksource_touch_watchdog();
+#endif
 	rcu_cpu_stall_reset();
 }
 
@@ -543,7 +545,7 @@ acquirelock:
 	 * Interrupts will be restored by the 'trap return' code, except when
 	 * single stepping.
 	 */
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 
 	cpu = ks->cpu;
 	kgdb_info[cpu].debuggerinfo = regs;
@@ -594,7 +596,7 @@ return_normal:
 			smp_mb__before_atomic();
 			atomic_dec(&slaves_in_kgdb);
 			dbg_touch_watchdogs();
-			local_irq_restore(flags);
+			hard_local_irq_restore(flags);
 			rcu_read_unlock();
 			return 0;
 		}
@@ -613,7 +615,7 @@ return_normal:
 		atomic_set(&kgdb_active, -1);
 		raw_spin_unlock(&dbg_master_lock);
 		dbg_touch_watchdogs();
-		local_irq_restore(flags);
+		hard_local_irq_restore(flags);
 		rcu_read_unlock();
 
 		goto acquirelock;
@@ -737,7 +739,7 @@ kgdb_restore:
 	atomic_set(&kgdb_active, -1);
 	raw_spin_unlock(&dbg_master_lock);
 	dbg_touch_watchdogs();
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 	rcu_read_unlock();
 
 	return kgdb_info[cpu].ret_state;
@@ -856,9 +858,9 @@ static void kgdb_console_write(struct console *co, const char *s,
 	if (!kgdb_connected || atomic_read(&kgdb_active) != -1 || dbg_kdb_mode)
 		return;
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 	gdbstub_msg_write(s, count);
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 static struct console kgdbcons = {
diff --git a/kernel/exit.c b/kernel/exit.c
index fa46977b9c07..d0535466f7bf 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -57,6 +57,7 @@
 #include <trace/events/sched.h>
 #include <linux/hw_breakpoint.h>
 #include <linux/oom.h>
+#include <linux/ipipe.h>
 #include <linux/writeback.h>
 #include <linux/shm.h>
 #include <linux/kcov.h>
@@ -762,6 +763,7 @@ void __noreturn do_exit(long code)
 	}
 
 	exit_signals(tsk);  /* sets PF_EXITING */
+	__ipipe_report_exit(tsk);
 
 	/* sync mm's RSS info before statistics gathering */
 	if (tsk->mm)
diff --git a/kernel/fork.c b/kernel/fork.c
index 419fff8eb9e5..2133ee5f0f00 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -57,6 +57,7 @@
 #include <linux/futex.h>
 #include <linux/compat.h>
 #include <linux/kthread.h>
+#include <linux/ipipe.h>
 #include <linux/task_io_accounting_ops.h>
 #include <linux/rcupdate.h>
 #include <linux/ptrace.h>
@@ -93,6 +94,7 @@
 #include <linux/kcov.h>
 #include <linux/livepatch.h>
 #include <linux/thread_info.h>
+#include <ipipe/thread_info.h>
 #include <linux/stackleak.h>
 
 #include <asm/pgtable.h>
@@ -904,6 +906,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 #endif
 
 	setup_thread_stack(tsk, orig);
+#ifdef CONFIG_IPIPE
+	__ipipe_init_threadflags(task_thread_info(tsk));
+	__ipipe_init_threadinfo(&task_thread_info(tsk)->ipipe_data);
+#endif
 	clear_user_return_notifier(tsk);
 	clear_tsk_need_resched(tsk);
 	set_task_stack_end_magic(tsk);
@@ -1076,6 +1082,7 @@ static inline void __mmput(struct mm_struct *mm)
 	exit_aio(mm);
 	ksm_exit(mm);
 	khugepaged_exit(mm); /* must run before exit_mmap */
+	__ipipe_report_cleanup(mm);
 	exit_mmap(mm);
 	mm_put_huge_zero_page(mm);
 	set_mm_exe_file(mm, NULL);
diff --git a/kernel/ipipe/Kconfig b/kernel/ipipe/Kconfig
new file mode 100644
index 000000000000..d17edb89f1f5
--- /dev/null
+++ b/kernel/ipipe/Kconfig
@@ -0,0 +1,47 @@
+
+config HAVE_IPIPE_SUPPORT
+       depends on GENERIC_CLOCKEVENTS
+       bool
+
+config IPIPE
+	bool "Interrupt pipeline"
+	depends on HAVE_IPIPE_SUPPORT
+	default n
+	---help---
+	  Activate this option if you want the interrupt pipeline to be
+	  compiled in.
+
+config IPIPE_CORE
+	def_bool y if IPIPE
+
+config IPIPE_WANT_PTE_PINNING
+       bool
+
+config IPIPE_CORE_APIREV
+       int
+       depends on IPIPE
+       default 2
+	---help---
+	  The API revision level we implement.
+
+config IPIPE_WANT_APIREV_2
+       bool
+
+config IPIPE_TARGET_APIREV
+       int
+       depends on IPIPE
+       default IPIPE_CORE_APIREV
+	---help---
+	  The API revision level the we want (must be <=
+	  IPIPE_CORE_APIREV).
+
+config IPIPE_HAVE_HOSTRT
+       bool
+
+config IPIPE_HAVE_EAGER_FPU
+	bool
+
+if IPIPE && ARM && RAW_PRINTK && !DEBUG_LL
+comment "CAUTION: DEBUG_LL must be selected, and properly configured for"
+comment "RAW_PRINTK to work. Otherwise, you will get no output on raw_printk()"
+endif
diff --git a/kernel/ipipe/Kconfig.debug b/kernel/ipipe/Kconfig.debug
new file mode 100644
index 000000000000..d1894cf62d54
--- /dev/null
+++ b/kernel/ipipe/Kconfig.debug
@@ -0,0 +1,100 @@
+config IPIPE_DEBUG
+	bool "I-pipe debugging"
+	depends on IPIPE
+	select RAW_PRINTK
+
+config IPIPE_DEBUG_CONTEXT
+	bool "Check for illicit cross-domain calls"
+	depends on IPIPE_DEBUG
+	default y
+	---help---
+	  Enable this feature to arm checkpoints in the kernel that
+	  verify the correct invocation context. On entry of critical
+	  Linux services a warning is issued if the caller is not
+	  running over the root domain.
+
+config IPIPE_DEBUG_INTERNAL
+	bool "Enable internal debug checks"
+	depends on IPIPE_DEBUG
+	default y
+	---help---
+	  When this feature is enabled, I-pipe will perform internal
+	  consistency checks of its subsystems, e.g. on per-cpu variable
+	  access.
+
+config HAVE_IPIPE_TRACER_SUPPORT
+       bool
+
+config IPIPE_TRACE
+	bool "Latency tracing"
+	depends on HAVE_IPIPE_TRACER_SUPPORT
+	depends on IPIPE_DEBUG
+	select CONFIG_FTRACE
+	select CONFIG_FUNCTION_TRACER
+	select KALLSYMS
+	select PROC_FS
+	---help---
+	  Activate this option if you want to use per-function tracing of
+	  the kernel. The tracer will collect data via instrumentation
+	  features like the one below or with the help of explicite calls
+	  of ipipe_trace_xxx(). See include/linux/ipipe_trace.h for the
+	  in-kernel tracing API. The collected data and runtime control
+	  is available via /proc/ipipe/trace/*.
+
+if IPIPE_TRACE
+
+config IPIPE_TRACE_ENABLE
+	bool "Enable tracing on boot"
+	default y
+	---help---
+	  Disable this option if you want to arm the tracer after booting
+	  manually ("echo 1 > /proc/ipipe/tracer/enable"). This can reduce
+	  boot time on slow embedded devices due to the tracer overhead.
+
+config IPIPE_TRACE_MCOUNT
+	bool "Instrument function entries"
+	default y
+	select FTRACE
+	select FUNCTION_TRACER
+	---help---
+	  When enabled, records every kernel function entry in the tracer
+	  log. While this slows down the system noticeably, it provides
+	  the highest level of information about the flow of events.
+	  However, it can be switch off in order to record only explicit
+	  I-pipe trace points.
+
+config IPIPE_TRACE_IRQSOFF
+	bool "Trace IRQs-off times"
+	default y
+	---help---
+	  Activate this option if I-pipe shall trace the longest path
+	  with hard-IRQs switched off.
+
+config IPIPE_TRACE_SHIFT
+	int "Depth of trace log (14 => 16Kpoints, 15 => 32Kpoints)"
+	range 10 18
+	default 14
+	---help---
+	  The number of trace points to hold tracing data for each
+	  trace path, as a power of 2.
+
+config IPIPE_TRACE_VMALLOC
+	bool "Use vmalloc'ed trace buffer"
+	default y if EMBEDDED
+	---help---
+	  Instead of reserving static kernel data, the required buffer
+	  is allocated via vmalloc during boot-up when this option is
+	  enabled. This can help to start systems that are low on memory,
+	  but it slightly degrades overall performance. Try this option
+	  when a traced kernel hangs unexpectedly at boot time.
+
+config IPIPE_TRACE_PANIC
+	bool "Enable panic back traces"
+	default y
+	---help---
+	  Provides services to freeze and dump a back trace on panic
+	  situations. This is used on IPIPE_DEBUG_CONTEXT exceptions
+	  as well as ordinary kernel oopses. You can control the number
+	  of printed back trace points via /proc/ipipe/trace.
+
+endif
diff --git a/kernel/ipipe/Makefile b/kernel/ipipe/Makefile
new file mode 100644
index 000000000000..73755150634f
--- /dev/null
+++ b/kernel/ipipe/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_IPIPE)	+= core.o timer.o
+obj-$(CONFIG_IPIPE_TRACE) += tracer.o
diff --git a/kernel/ipipe/core.c b/kernel/ipipe/core.c
new file mode 100644
index 000000000000..60f74291f22a
--- /dev/null
+++ b/kernel/ipipe/core.c
@@ -0,0 +1,2117 @@
+/* -*- linux-c -*-
+ * linux/kernel/ipipe/core.c
+ *
+ * Copyright (C) 2002-2012 Philippe Gerum.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Architecture-independent I-PIPE core support.
+ */
+#include <linux/version.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/sched/debug.h>
+#include <linux/kallsyms.h>
+#include <linux/bitops.h>
+#include <linux/tick.h>
+#include <linux/interrupt.h>
+#include <linux/uaccess.h>
+#include <linux/cpuidle.h>
+#include <linux/sched/idle.h>
+#ifdef CONFIG_PROC_FS
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#endif	/* CONFIG_PROC_FS */
+#include <linux/ipipe_trace.h>
+#include <linux/ipipe.h>
+#include <ipipe/setup.h>
+#include <asm/syscall.h>
+#include <asm/unistd.h>
+
+struct ipipe_domain ipipe_root;
+EXPORT_SYMBOL_GPL(ipipe_root);
+
+struct ipipe_domain *ipipe_head_domain = &ipipe_root;
+EXPORT_SYMBOL_GPL(ipipe_head_domain);
+
+#ifdef CONFIG_SMP
+static __initdata struct ipipe_percpu_domain_data bootup_context = {
+	.status = IPIPE_STALL_MASK,
+	.domain = &ipipe_root,
+};
+#else
+#define bootup_context ipipe_percpu.root
+#endif	/* !CONFIG_SMP */
+
+DEFINE_PER_CPU(struct ipipe_percpu_data, ipipe_percpu) = {
+	.root = {
+		.status = IPIPE_STALL_MASK,
+		.domain = &ipipe_root,
+	},
+	.curr = &bootup_context,
+	.hrtimer_irq = -1,
+#ifdef CONFIG_IPIPE_DEBUG_CONTEXT
+	.context_check = 1,
+#endif
+};
+EXPORT_PER_CPU_SYMBOL(ipipe_percpu);
+
+/* Up to 2k of pending work data per CPU. */
+#define WORKBUF_SIZE 2048
+static DEFINE_PER_CPU_ALIGNED(unsigned char[WORKBUF_SIZE], work_buf);
+static DEFINE_PER_CPU(void *, work_tail);
+static unsigned int __ipipe_work_virq;
+
+static void __ipipe_do_work(unsigned int virq, void *cookie);
+
+#ifdef CONFIG_SMP
+
+#define IPIPE_CRITICAL_TIMEOUT	1000000
+static cpumask_t __ipipe_cpu_sync_map;
+static cpumask_t __ipipe_cpu_lock_map;
+static cpumask_t __ipipe_cpu_pass_map;
+static unsigned long __ipipe_critical_lock;
+static IPIPE_DEFINE_SPINLOCK(__ipipe_cpu_barrier);
+static atomic_t __ipipe_critical_count = ATOMIC_INIT(0);
+static void (*__ipipe_cpu_sync) (void);
+
+#else /* !CONFIG_SMP */
+/*
+ * Create an alias to the unique root status, so that arch-dep code
+ * may get fast access to this percpu variable including from
+ * assembly.  A hard-coded assumption is that root.status appears at
+ * offset #0 of the ipipe_percpu struct.
+ */
+extern unsigned long __ipipe_root_status
+__attribute__((alias(__stringify(ipipe_percpu))));
+EXPORT_SYMBOL(__ipipe_root_status);
+
+#endif /* !CONFIG_SMP */
+
+IPIPE_DEFINE_SPINLOCK(__ipipe_lock);
+
+static unsigned long __ipipe_virtual_irq_map;
+
+#ifdef CONFIG_PRINTK
+unsigned int __ipipe_printk_virq;
+int __ipipe_printk_bypass;
+#endif /* CONFIG_PRINTK */
+
+#ifdef CONFIG_PROC_FS
+
+struct proc_dir_entry *ipipe_proc_root;
+
+static int __ipipe_version_info_show(struct seq_file *p, void *data)
+{
+	seq_printf(p, "%d\n", IPIPE_CORE_RELEASE);
+	return 0;
+}
+
+static int __ipipe_version_info_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, __ipipe_version_info_show, NULL);
+}
+
+static const struct file_operations __ipipe_version_proc_ops = {
+	.open		= __ipipe_version_info_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+static int __ipipe_common_info_show(struct seq_file *p, void *data)
+{
+	struct ipipe_domain *ipd = (struct ipipe_domain *)p->private;
+	char handling, lockbit, virtuality;
+	unsigned long ctlbits;
+	unsigned int irq;
+
+	seq_printf(p, "        +--- Handled\n");
+	seq_printf(p, "        |+-- Locked\n");
+	seq_printf(p, "        ||+- Virtual\n");
+	seq_printf(p, " [IRQ]  |||  Handler\n");
+
+	mutex_lock(&ipd->mutex);
+
+	for (irq = 0; irq < IPIPE_NR_IRQS; irq++) {
+		ctlbits = ipd->irqs[irq].control;
+		/*
+		 * There might be a hole between the last external IRQ
+		 * and the first virtual one; skip it.
+		 */
+		if (irq >= IPIPE_NR_XIRQS && !ipipe_virtual_irq_p(irq))
+			continue;
+
+		if (ipipe_virtual_irq_p(irq)
+		    && !test_bit(irq - IPIPE_VIRQ_BASE, &__ipipe_virtual_irq_map))
+			/* Non-allocated virtual IRQ; skip it. */
+			continue;
+
+		if (ctlbits & IPIPE_HANDLE_MASK)
+			handling = 'H';
+		else
+			handling = '.';
+
+		if (ctlbits & IPIPE_LOCK_MASK)
+			lockbit = 'L';
+		else
+			lockbit = '.';
+
+		if (ipipe_virtual_irq_p(irq))
+			virtuality = 'V';
+		else
+			virtuality = '.';
+
+		if (ctlbits & IPIPE_HANDLE_MASK)
+			seq_printf(p, " %4u:  %c%c%c  %pf\n",
+				   irq, handling, lockbit, virtuality,
+				   ipd->irqs[irq].handler);
+		else
+			seq_printf(p, " %4u:  %c%c%c\n",
+				   irq, handling, lockbit, virtuality);
+	}
+
+	mutex_unlock(&ipd->mutex);
+
+	return 0;
+}
+
+static int __ipipe_common_info_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, __ipipe_common_info_show, PDE_DATA(inode));
+}
+
+static const struct file_operations __ipipe_info_proc_ops = {
+	.owner		= THIS_MODULE,
+	.open		= __ipipe_common_info_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+void add_domain_proc(struct ipipe_domain *ipd)
+{
+	proc_create_data(ipd->name, 0444, ipipe_proc_root,
+			 &__ipipe_info_proc_ops, ipd);
+}
+
+void remove_domain_proc(struct ipipe_domain *ipd)
+{
+	remove_proc_entry(ipd->name, ipipe_proc_root);
+}
+
+void __init __ipipe_init_proc(void)
+{
+	ipipe_proc_root = proc_mkdir("ipipe", NULL);
+	proc_create("version", 0444, ipipe_proc_root,
+		    &__ipipe_version_proc_ops);
+	add_domain_proc(ipipe_root_domain);
+
+	__ipipe_init_tracer();
+}
+
+#else
+
+static inline void add_domain_proc(struct ipipe_domain *ipd)
+{
+}
+
+static inline void remove_domain_proc(struct ipipe_domain *ipd)
+{
+}
+
+#endif	/* CONFIG_PROC_FS */
+
+static void init_stage(struct ipipe_domain *ipd)
+{
+	memset(&ipd->irqs, 0, sizeof(ipd->irqs));
+	mutex_init(&ipd->mutex);
+	__ipipe_hook_critical_ipi(ipd);
+}
+
+static inline int root_context_offset(void)
+{
+	void root_context_not_at_start_of_ipipe_percpu(void);
+
+	/* ipipe_percpu.root must be found at offset #0. */
+
+	if (offsetof(struct ipipe_percpu_data, root))
+		root_context_not_at_start_of_ipipe_percpu();
+
+	return 0;
+}
+
+#ifdef CONFIG_SMP
+
+static inline void fixup_percpu_data(void)
+{
+	struct ipipe_percpu_data *p;
+	int cpu;
+
+	/*
+	 * ipipe_percpu.curr cannot be assigned statically to
+	 * &ipipe_percpu.root, due to the dynamic nature of percpu
+	 * data. So we make ipipe_percpu.curr refer to a temporary
+	 * boot up context in static memory, until we can fixup all
+	 * context pointers in this routine, after per-cpu areas have
+	 * been eventually set up. The temporary context data is
+	 * copied to per_cpu(ipipe_percpu, 0).root in the same move.
+	 *
+	 * Obviously, this code must run over the boot CPU, before SMP
+	 * operations start.
+	 */
+	BUG_ON(smp_processor_id() || !irqs_disabled());
+
+	per_cpu(ipipe_percpu, 0).root = bootup_context;
+
+	for_each_possible_cpu(cpu) {
+		p = &per_cpu(ipipe_percpu, cpu);
+		p->curr = &p->root;
+	}
+}
+
+#else /* !CONFIG_SMP */
+
+static inline void fixup_percpu_data(void) { }
+
+#endif /* CONFIG_SMP */
+
+void __init __ipipe_init_early(void)
+{
+	struct ipipe_domain *ipd = &ipipe_root;
+	int cpu;
+
+	fixup_percpu_data();
+
+	/*
+	 * A lightweight registration code for the root domain. We are
+	 * running on the boot CPU, hw interrupts are off, and
+	 * secondary CPUs are still lost in space.
+	 */
+	ipd->name = "Linux";
+	ipd->context_offset = root_context_offset();
+	init_stage(ipd);
+
+	/*
+	 * Do the early init stuff. First we do the per-arch pipeline
+	 * core setup, then we run the per-client setup code. At this
+	 * point, the kernel does not provide much services yet: be
+	 * careful.
+	 */
+	__ipipe_early_core_setup();
+	__ipipe_early_client_setup();
+
+#ifdef CONFIG_PRINTK
+	__ipipe_printk_virq = ipipe_alloc_virq();
+	ipd->irqs[__ipipe_printk_virq].handler = __ipipe_flush_printk;
+	ipd->irqs[__ipipe_printk_virq].cookie = NULL;
+	ipd->irqs[__ipipe_printk_virq].ackfn = NULL;
+	ipd->irqs[__ipipe_printk_virq].control = IPIPE_HANDLE_MASK;
+#endif /* CONFIG_PRINTK */
+
+	__ipipe_work_virq = ipipe_alloc_virq();
+	ipd->irqs[__ipipe_work_virq].handler = __ipipe_do_work;
+	ipd->irqs[__ipipe_work_virq].cookie = NULL;
+	ipd->irqs[__ipipe_work_virq].ackfn = NULL;
+	ipd->irqs[__ipipe_work_virq].control = IPIPE_HANDLE_MASK;
+
+	for_each_possible_cpu(cpu)
+		per_cpu(work_tail, cpu) = per_cpu(work_buf, cpu);
+}
+
+void __init __ipipe_init(void)
+{
+	/* Now we may engage the pipeline. */
+	__ipipe_enable_pipeline();
+
+	pr_info("Interrupt pipeline (release #%d)\n", IPIPE_CORE_RELEASE);
+}
+
+static inline void init_head_stage(struct ipipe_domain *ipd)
+{
+	struct ipipe_percpu_domain_data *p;
+	int cpu;
+
+	/* Must be set first, used in ipipe_percpu_context(). */
+	ipd->context_offset = offsetof(struct ipipe_percpu_data, head);
+
+	for_each_online_cpu(cpu) {
+		p = ipipe_percpu_context(ipd, cpu);
+		memset(p, 0, sizeof(*p));
+		p->domain = ipd;
+	}
+
+	init_stage(ipd);
+}
+
+void ipipe_register_head(struct ipipe_domain *ipd, const char *name)
+{
+	BUG_ON(!ipipe_root_p || ipd == &ipipe_root);
+
+	ipd->name = name;
+	init_head_stage(ipd);
+	barrier();
+	ipipe_head_domain = ipd;
+	add_domain_proc(ipd);
+
+	pr_info("I-pipe: head domain %s registered.\n", name);
+}
+EXPORT_SYMBOL_GPL(ipipe_register_head);
+
+void ipipe_unregister_head(struct ipipe_domain *ipd)
+{
+	BUG_ON(!ipipe_root_p || ipd != ipipe_head_domain);
+
+	ipipe_head_domain = &ipipe_root;
+	smp_mb();
+	mutex_lock(&ipd->mutex);
+	remove_domain_proc(ipd);
+	mutex_unlock(&ipd->mutex);
+
+	pr_info("I-pipe: head domain %s unregistered.\n", ipd->name);
+}
+EXPORT_SYMBOL_GPL(ipipe_unregister_head);
+
+void ipipe_stall_root(void)
+{
+	unsigned long flags;
+
+	ipipe_root_only();
+	flags = hard_smp_local_irq_save();
+	__set_bit(IPIPE_STALL_FLAG, &__ipipe_root_status);
+	hard_smp_local_irq_restore(flags);
+}
+EXPORT_SYMBOL(ipipe_stall_root);
+
+unsigned long ipipe_test_and_stall_root(void)
+{
+	unsigned long flags;
+	int x;
+
+	ipipe_root_only();
+	flags = hard_smp_local_irq_save();
+	x = __test_and_set_bit(IPIPE_STALL_FLAG, &__ipipe_root_status);
+	hard_smp_local_irq_restore(flags);
+
+	return x;
+}
+EXPORT_SYMBOL(ipipe_test_and_stall_root);
+
+unsigned long ipipe_test_root(void)
+{
+	unsigned long flags;
+	int x;
+
+	flags = hard_smp_local_irq_save();
+	x = test_bit(IPIPE_STALL_FLAG, &__ipipe_root_status);
+	hard_smp_local_irq_restore(flags);
+
+	return x;
+}
+EXPORT_SYMBOL(ipipe_test_root);
+
+void ipipe_unstall_root(void)
+{
+	struct ipipe_percpu_domain_data *p;
+
+	hard_local_irq_disable();
+
+	/* This helps catching bad usage from assembly call sites. */
+	ipipe_root_only();
+
+	p = ipipe_this_cpu_root_context();
+
+	__clear_bit(IPIPE_STALL_FLAG, &p->status);
+
+	if (unlikely(__ipipe_ipending_p(p)))
+		__ipipe_sync_stage();
+
+	hard_local_irq_enable();
+}
+EXPORT_SYMBOL(ipipe_unstall_root);
+
+void ipipe_restore_root(unsigned long x)
+{
+	ipipe_root_only();
+
+	if (x)
+		ipipe_stall_root();
+	else
+		ipipe_unstall_root();
+}
+EXPORT_SYMBOL(ipipe_restore_root);
+
+void __ipipe_restore_root_nosync(unsigned long x)
+{
+	struct ipipe_percpu_domain_data *p = ipipe_this_cpu_root_context();
+
+	if (raw_irqs_disabled_flags(x)) {
+		__set_bit(IPIPE_STALL_FLAG, &p->status);
+		trace_hardirqs_off();
+	} else {
+		trace_hardirqs_on();
+		__clear_bit(IPIPE_STALL_FLAG, &p->status);
+	}
+}
+EXPORT_SYMBOL_GPL(__ipipe_restore_root_nosync);
+
+void ipipe_unstall_head(void)
+{
+	struct ipipe_percpu_domain_data *p = ipipe_this_cpu_head_context();
+
+	hard_local_irq_disable();
+
+	__clear_bit(IPIPE_STALL_FLAG, &p->status);
+
+	if (unlikely(__ipipe_ipending_p(p)))
+		__ipipe_sync_pipeline(ipipe_head_domain);
+
+	hard_local_irq_enable();
+}
+EXPORT_SYMBOL_GPL(ipipe_unstall_head);
+
+void __ipipe_restore_head(unsigned long x) /* hw interrupt off */
+{
+	struct ipipe_percpu_domain_data *p = ipipe_this_cpu_head_context();
+
+	if (x) {
+#ifdef CONFIG_DEBUG_KERNEL
+		static int warned;
+		if (!warned &&
+		    __test_and_set_bit(IPIPE_STALL_FLAG, &p->status)) {
+			/*
+			 * Already stalled albeit ipipe_restore_head()
+			 * should have detected it? Send a warning once.
+			 */
+			hard_local_irq_enable();
+			warned = 1;
+			pr_warning("I-pipe: ipipe_restore_head() "
+				   "optimization failed.\n");
+			dump_stack();
+			hard_local_irq_disable();
+		}
+#else /* !CONFIG_DEBUG_KERNEL */
+		__set_bit(IPIPE_STALL_FLAG, &p->status);
+#endif /* CONFIG_DEBUG_KERNEL */
+	} else {
+		__clear_bit(IPIPE_STALL_FLAG, &p->status);
+		if (unlikely(__ipipe_ipending_p(p)))
+			__ipipe_sync_pipeline(ipipe_head_domain);
+		hard_local_irq_enable();
+	}
+}
+EXPORT_SYMBOL_GPL(__ipipe_restore_head);
+
+void __ipipe_spin_lock_irq(ipipe_spinlock_t *lock)
+{
+	hard_local_irq_disable();
+	if (ipipe_smp_p)
+		arch_spin_lock(&lock->arch_lock);
+	__set_bit(IPIPE_STALL_FLAG, &__ipipe_current_context->status);
+}
+EXPORT_SYMBOL_GPL(__ipipe_spin_lock_irq);
+
+void __ipipe_spin_unlock_irq(ipipe_spinlock_t *lock)
+{
+	if (ipipe_smp_p)
+		arch_spin_unlock(&lock->arch_lock);
+	__clear_bit(IPIPE_STALL_FLAG, &__ipipe_current_context->status);
+	hard_local_irq_enable();
+}
+EXPORT_SYMBOL_GPL(__ipipe_spin_unlock_irq);
+
+unsigned long __ipipe_spin_lock_irqsave(ipipe_spinlock_t *lock)
+{
+	unsigned long flags;
+	int s;
+
+	flags = hard_local_irq_save();
+	if (ipipe_smp_p)
+		arch_spin_lock(&lock->arch_lock);
+	s = __test_and_set_bit(IPIPE_STALL_FLAG, &__ipipe_current_context->status);
+
+	return arch_mangle_irq_bits(s, flags);
+}
+EXPORT_SYMBOL_GPL(__ipipe_spin_lock_irqsave);
+
+int __ipipe_spin_trylock_irqsave(ipipe_spinlock_t *lock,
+				 unsigned long *x)
+{
+	unsigned long flags;
+	int s;
+
+	flags = hard_local_irq_save();
+	if (ipipe_smp_p && !arch_spin_trylock(&lock->arch_lock)) {
+		hard_local_irq_restore(flags);
+		return 0;
+	}
+	s = __test_and_set_bit(IPIPE_STALL_FLAG, &__ipipe_current_context->status);
+	*x = arch_mangle_irq_bits(s, flags);
+
+	return 1;
+}
+EXPORT_SYMBOL_GPL(__ipipe_spin_trylock_irqsave);
+
+void __ipipe_spin_unlock_irqrestore(ipipe_spinlock_t *lock,
+				    unsigned long x)
+{
+	if (ipipe_smp_p)
+		arch_spin_unlock(&lock->arch_lock);
+	if (!arch_demangle_irq_bits(&x))
+		__clear_bit(IPIPE_STALL_FLAG, &__ipipe_current_context->status);
+	hard_local_irq_restore(x);
+}
+EXPORT_SYMBOL_GPL(__ipipe_spin_unlock_irqrestore);
+
+int __ipipe_spin_trylock_irq(ipipe_spinlock_t *lock)
+{
+	unsigned long flags;
+
+	flags = hard_local_irq_save();
+	if (ipipe_smp_p && !arch_spin_trylock(&lock->arch_lock)) {
+		hard_local_irq_restore(flags);
+		return 0;
+	}
+	__set_bit(IPIPE_STALL_FLAG, &__ipipe_current_context->status);
+
+	return 1;
+}
+EXPORT_SYMBOL_GPL(__ipipe_spin_trylock_irq);
+
+void __ipipe_spin_unlock_irqbegin(ipipe_spinlock_t *lock)
+{
+	if (ipipe_smp_p)
+		arch_spin_unlock(&lock->arch_lock);
+}
+
+void __ipipe_spin_unlock_irqcomplete(unsigned long x)
+{
+	if (!arch_demangle_irq_bits(&x))
+		__clear_bit(IPIPE_STALL_FLAG, &__ipipe_current_context->status);
+	hard_local_irq_restore(x);
+}
+
+/* Must be called hw IRQs off. */
+static inline void __ipipe_set_irq_held(struct ipipe_percpu_domain_data *p,
+					unsigned int irq)
+{
+	__set_bit(irq, p->irqheld_map);
+	p->irqall[irq]++;
+}
+
+#if __IPIPE_IRQMAP_LEVELS == 4
+
+/* Must be called hw IRQs off. */
+void __ipipe_set_irq_pending(struct ipipe_domain *ipd, unsigned int irq)
+{
+	struct ipipe_percpu_domain_data *p = ipipe_this_cpu_context(ipd);
+	int l0b, l1b, l2b;
+
+	IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+	l0b = irq / (BITS_PER_LONG * BITS_PER_LONG * BITS_PER_LONG);
+	l1b = irq / (BITS_PER_LONG * BITS_PER_LONG);
+	l2b = irq / BITS_PER_LONG;
+
+	if (likely(!test_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))) {
+		__set_bit(l0b, &p->irqpend_0map);
+		__set_bit(l1b, p->irqpend_1map);
+		__set_bit(l2b, p->irqpend_2map);
+		__set_bit(irq, p->irqpend_map);
+	} else
+		__set_bit(irq, p->irqheld_map);
+
+	p->irqall[irq]++;
+}
+EXPORT_SYMBOL_GPL(__ipipe_set_irq_pending);
+
+/* Must be called hw IRQs off. */
+void __ipipe_lock_irq(unsigned int irq)
+{
+	struct ipipe_domain *ipd = ipipe_root_domain;
+	struct ipipe_percpu_domain_data *p;
+	int l0b, l1b, l2b;
+
+	IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+	/*
+	 * Interrupts requested by a registered head domain cannot be
+	 * locked, since this would make no sense: interrupts are
+	 * globally masked at CPU level when the head domain is
+	 * stalled, so there is no way we could encounter the
+	 * situation IRQ locks are handling.
+	 */
+	if (test_and_set_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))
+		return;
+
+	p = ipipe_this_cpu_context(ipd);
+	if (__test_and_clear_bit(irq, p->irqpend_map)) {
+		__set_bit(irq, p->irqheld_map);
+		l2b = irq / BITS_PER_LONG;
+		if (p->irqpend_map[l2b] == 0) {
+			__clear_bit(l2b, p->irqpend_2map);
+			l1b = l2b / BITS_PER_LONG;
+			if (p->irqpend_2map[l1b] == 0) {
+				__clear_bit(l1b, p->irqpend_1map);
+				l0b = l1b / BITS_PER_LONG;
+				if (p->irqpend_1map[l0b] == 0)
+					__clear_bit(l0b, &p->irqpend_0map);
+			}
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(__ipipe_lock_irq);
+
+/* Must be called hw IRQs off. */
+void __ipipe_unlock_irq(unsigned int irq)
+{
+	struct ipipe_domain *ipd = ipipe_root_domain;
+	struct ipipe_percpu_domain_data *p;
+	int l0b, l1b, l2b, cpu;
+
+	IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+	if (!test_and_clear_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))
+		return;
+
+	l0b = irq / (BITS_PER_LONG * BITS_PER_LONG * BITS_PER_LONG);
+	l1b = irq / (BITS_PER_LONG * BITS_PER_LONG);
+	l2b = irq / BITS_PER_LONG;
+
+	for_each_online_cpu(cpu) {
+		p = ipipe_this_cpu_root_context();
+		if (test_and_clear_bit(irq, p->irqheld_map)) {
+			/* We need atomic ops here: */
+			set_bit(irq, p->irqpend_map);
+			set_bit(l2b, p->irqpend_2map);
+			set_bit(l1b, p->irqpend_1map);
+			set_bit(l0b, &p->irqpend_0map);
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(__ipipe_unlock_irq);
+
+#define wmul1(__n)  ((__n) * BITS_PER_LONG)
+#define wmul2(__n)  (wmul1(__n) * BITS_PER_LONG)
+#define wmul3(__n)  (wmul2(__n) * BITS_PER_LONG)
+
+static inline int __ipipe_next_irq(struct ipipe_percpu_domain_data *p)
+{
+	unsigned long l0m, l1m, l2m, l3m;
+	int l0b, l1b, l2b, l3b;
+	unsigned int irq;
+
+	l0m = p->irqpend_0map;
+	if (unlikely(l0m == 0))
+		return -1;
+	l0b = __ipipe_ffnz(l0m);
+	irq = wmul3(l0b);
+
+	l1m = p->irqpend_1map[l0b];
+	if (unlikely(l1m == 0))
+		return -1;
+	l1b = __ipipe_ffnz(l1m);
+	irq += wmul2(l1b);
+
+	l2m = p->irqpend_2map[wmul1(l0b) + l1b];
+	if (unlikely(l2m == 0))
+		return -1;
+	l2b = __ipipe_ffnz(l2m);
+	irq += wmul1(l2b);
+
+	l3m = p->irqpend_map[wmul2(l0b) + wmul1(l1b) + l2b];
+	if (unlikely(l3m == 0))
+		return -1;
+	l3b = __ipipe_ffnz(l3m);
+	irq += l3b;
+
+	__clear_bit(irq, p->irqpend_map);
+	if (p->irqpend_map[irq / BITS_PER_LONG] == 0) {
+		__clear_bit(l2b, &p->irqpend_2map[wmul1(l0b) + l1b]);
+		if (p->irqpend_2map[wmul1(l0b) + l1b] == 0) {
+			__clear_bit(l1b, &p->irqpend_1map[l0b]);
+			if (p->irqpend_1map[l0b] == 0)
+				__clear_bit(l0b, &p->irqpend_0map);
+		}
+	}
+
+	return irq;
+}
+
+#elif __IPIPE_IRQMAP_LEVELS == 3
+
+/* Must be called hw IRQs off. */
+void __ipipe_set_irq_pending(struct ipipe_domain *ipd, unsigned int irq)
+{
+	struct ipipe_percpu_domain_data *p = ipipe_this_cpu_context(ipd);
+	int l0b, l1b;
+
+	IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+	l0b = irq / (BITS_PER_LONG * BITS_PER_LONG);
+	l1b = irq / BITS_PER_LONG;
+
+	if (likely(!test_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))) {
+		__set_bit(irq, p->irqpend_map);
+		__set_bit(l1b, p->irqpend_1map);
+		__set_bit(l0b, &p->irqpend_0map);
+	} else
+		__set_bit(irq, p->irqheld_map);
+
+	p->irqall[irq]++;
+}
+EXPORT_SYMBOL_GPL(__ipipe_set_irq_pending);
+
+/* Must be called hw IRQs off. */
+void __ipipe_lock_irq(unsigned int irq)
+{
+	struct ipipe_domain *ipd = ipipe_root_domain;
+	struct ipipe_percpu_domain_data *p;
+	int l0b, l1b;
+
+	IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+	/*
+	 * Interrupts requested by a registered head domain cannot be
+	 * locked, since this would make no sense: interrupts are
+	 * globally masked at CPU level when the head domain is
+	 * stalled, so there is no way we could encounter the
+	 * situation IRQ locks are handling.
+	 */
+	if (test_and_set_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))
+		return;
+
+	l0b = irq / (BITS_PER_LONG * BITS_PER_LONG);
+	l1b = irq / BITS_PER_LONG;
+
+	p = ipipe_this_cpu_context(ipd);
+	if (__test_and_clear_bit(irq, p->irqpend_map)) {
+		__set_bit(irq, p->irqheld_map);
+		if (p->irqpend_map[l1b] == 0) {
+			__clear_bit(l1b, p->irqpend_1map);
+			if (p->irqpend_1map[l0b] == 0)
+				__clear_bit(l0b, &p->irqpend_0map);
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(__ipipe_lock_irq);
+
+/* Must be called hw IRQs off. */
+void __ipipe_unlock_irq(unsigned int irq)
+{
+	struct ipipe_domain *ipd = ipipe_root_domain;
+	struct ipipe_percpu_domain_data *p;
+	int l0b, l1b, cpu;
+
+	IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+	if (!test_and_clear_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))
+		return;
+
+	l0b = irq / (BITS_PER_LONG * BITS_PER_LONG);
+	l1b = irq / BITS_PER_LONG;
+
+	for_each_online_cpu(cpu) {
+		p = ipipe_this_cpu_root_context();
+		if (test_and_clear_bit(irq, p->irqheld_map)) {
+			/* We need atomic ops here: */
+			set_bit(irq, p->irqpend_map);
+			set_bit(l1b, p->irqpend_1map);
+			set_bit(l0b, &p->irqpend_0map);
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(__ipipe_unlock_irq);
+
+static inline int __ipipe_next_irq(struct ipipe_percpu_domain_data *p)
+{
+	int l0b, l1b, l2b;
+	unsigned long l0m, l1m, l2m;
+	unsigned int irq;
+
+	l0m = p->irqpend_0map;
+	if (unlikely(l0m == 0))
+		return -1;
+
+	l0b = __ipipe_ffnz(l0m);
+	l1m = p->irqpend_1map[l0b];
+	if (unlikely(l1m == 0))
+		return -1;
+
+	l1b = __ipipe_ffnz(l1m) + l0b * BITS_PER_LONG;
+	l2m = p->irqpend_map[l1b];
+	if (unlikely(l2m == 0))
+		return -1;
+
+	l2b = __ipipe_ffnz(l2m);
+	irq = l1b * BITS_PER_LONG + l2b;
+
+	__clear_bit(irq, p->irqpend_map);
+	if (p->irqpend_map[l1b] == 0) {
+		__clear_bit(l1b, p->irqpend_1map);
+		if (p->irqpend_1map[l0b] == 0)
+			__clear_bit(l0b, &p->irqpend_0map);
+	}
+
+	return irq;
+}
+
+#else /* __IPIPE_IRQMAP_LEVELS == 2 */
+
+/* Must be called hw IRQs off. */
+void __ipipe_set_irq_pending(struct ipipe_domain *ipd, unsigned int irq)
+{
+	struct ipipe_percpu_domain_data *p = ipipe_this_cpu_context(ipd);
+	int l0b = irq / BITS_PER_LONG;
+
+	IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+	if (likely(!test_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))) {
+		__set_bit(irq, p->irqpend_map);
+		__set_bit(l0b, &p->irqpend_0map);
+	} else
+		__set_bit(irq, p->irqheld_map);
+
+	p->irqall[irq]++;
+}
+EXPORT_SYMBOL_GPL(__ipipe_set_irq_pending);
+
+/* Must be called hw IRQs off. */
+void __ipipe_lock_irq(unsigned int irq)
+{
+	struct ipipe_percpu_domain_data *p;
+	int l0b = irq / BITS_PER_LONG;
+
+	IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+	if (test_and_set_bit(IPIPE_LOCK_FLAG,
+			     &ipipe_root_domain->irqs[irq].control))
+		return;
+
+	p = ipipe_this_cpu_root_context();
+	if (__test_and_clear_bit(irq, p->irqpend_map)) {
+		__set_bit(irq, p->irqheld_map);
+		if (p->irqpend_map[l0b] == 0)
+			__clear_bit(l0b, &p->irqpend_0map);
+	}
+}
+EXPORT_SYMBOL_GPL(__ipipe_lock_irq);
+
+/* Must be called hw IRQs off. */
+void __ipipe_unlock_irq(unsigned int irq)
+{
+	struct ipipe_domain *ipd = ipipe_root_domain;
+	struct ipipe_percpu_domain_data *p;
+	int l0b = irq / BITS_PER_LONG, cpu;
+
+	IPIPE_WARN_ONCE(!hard_irqs_disabled());
+
+	if (!test_and_clear_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))
+		return;
+
+	for_each_online_cpu(cpu) {
+		p = ipipe_percpu_context(ipd, cpu);
+		if (test_and_clear_bit(irq, p->irqheld_map)) {
+			/* We need atomic ops here: */
+			set_bit(irq, p->irqpend_map);
+			set_bit(l0b, &p->irqpend_0map);
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(__ipipe_unlock_irq);
+
+static inline int __ipipe_next_irq(struct ipipe_percpu_domain_data *p)
+{
+	unsigned long l0m, l1m;
+	int l0b, l1b;
+
+	l0m = p->irqpend_0map;
+	if (unlikely(l0m == 0))
+		return -1;
+
+	l0b = __ipipe_ffnz(l0m);
+	l1m = p->irqpend_map[l0b];
+	if (unlikely(l1m == 0))
+		return -1;
+
+	l1b = __ipipe_ffnz(l1m);
+	__clear_bit(l1b, &p->irqpend_map[l0b]);
+	if (p->irqpend_map[l0b] == 0)
+		__clear_bit(l0b, &p->irqpend_0map);
+
+	return l0b * BITS_PER_LONG + l1b;
+}
+
+#endif
+
+void __ipipe_do_sync_pipeline(struct ipipe_domain *top)
+{
+	struct ipipe_percpu_domain_data *p;
+	struct ipipe_domain *ipd;
+
+	/* We must enter over the root domain. */
+	IPIPE_WARN_ONCE(__ipipe_current_domain != ipipe_root_domain);
+	ipd = top;
+next:
+	p = ipipe_this_cpu_context(ipd);
+	if (test_bit(IPIPE_STALL_FLAG, &p->status))
+		return;
+
+	if (__ipipe_ipending_p(p)) {
+		if (ipd == ipipe_root_domain)
+			__ipipe_sync_stage();
+		else {
+			/* Switching to head. */
+			p->coflags &= ~__IPIPE_ALL_R;
+			__ipipe_set_current_context(p);
+			__ipipe_sync_stage();
+			__ipipe_set_current_domain(ipipe_root_domain);
+		}
+	}
+
+	if (ipd != ipipe_root_domain) {
+		ipd = ipipe_root_domain;
+		goto next;
+	}
+}
+EXPORT_SYMBOL_GPL(__ipipe_do_sync_pipeline);
+
+unsigned int ipipe_alloc_virq(void)
+{
+	unsigned long flags, irq = 0;
+	int ipos;
+
+	raw_spin_lock_irqsave(&__ipipe_lock, flags);
+
+	if (__ipipe_virtual_irq_map != ~0) {
+		ipos = ffz(__ipipe_virtual_irq_map);
+		set_bit(ipos, &__ipipe_virtual_irq_map);
+		irq = ipos + IPIPE_VIRQ_BASE;
+	}
+
+	raw_spin_unlock_irqrestore(&__ipipe_lock, flags);
+
+	return irq;
+}
+EXPORT_SYMBOL_GPL(ipipe_alloc_virq);
+
+void ipipe_free_virq(unsigned int virq)
+{
+	clear_bit(virq - IPIPE_VIRQ_BASE, &__ipipe_virtual_irq_map);
+	smp_mb__after_atomic();
+}
+EXPORT_SYMBOL_GPL(ipipe_free_virq);
+
+int ipipe_request_irq(struct ipipe_domain *ipd,
+		      unsigned int irq,
+		      ipipe_irq_handler_t handler,
+		      void *cookie,
+		      ipipe_irq_ackfn_t ackfn)
+{
+	unsigned long flags;
+	int ret = 0;
+
+	ipipe_root_only();
+
+	if (handler == NULL ||
+	    (irq >= IPIPE_NR_XIRQS && !ipipe_virtual_irq_p(irq)))
+		return -EINVAL;
+
+	raw_spin_lock_irqsave(&__ipipe_lock, flags);
+
+	if (ipd->irqs[irq].handler) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	if (ackfn == NULL)
+		ackfn = ipipe_root_domain->irqs[irq].ackfn;
+
+	ipd->irqs[irq].handler = handler;
+	ipd->irqs[irq].cookie = cookie;
+	ipd->irqs[irq].ackfn = ackfn;
+	ipd->irqs[irq].control = IPIPE_HANDLE_MASK;
+out:
+	raw_spin_unlock_irqrestore(&__ipipe_lock, flags);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ipipe_request_irq);
+
+void ipipe_free_irq(struct ipipe_domain *ipd,
+		    unsigned int irq)
+{
+	unsigned long flags;
+
+	ipipe_root_only();
+
+	raw_spin_lock_irqsave(&__ipipe_lock, flags);
+
+	if (ipd->irqs[irq].handler == NULL)
+		goto out;
+
+	ipd->irqs[irq].handler = NULL;
+	ipd->irqs[irq].cookie = NULL;
+	ipd->irqs[irq].ackfn = NULL;
+	ipd->irqs[irq].control = 0;
+out:
+	raw_spin_unlock_irqrestore(&__ipipe_lock, flags);
+}
+EXPORT_SYMBOL_GPL(ipipe_free_irq);
+
+void ipipe_set_hooks(struct ipipe_domain *ipd, int enables)
+{
+	struct ipipe_percpu_domain_data *p;
+	unsigned long flags;
+	int cpu, wait;
+
+	if (ipd == ipipe_root_domain) {
+		IPIPE_WARN(enables & __IPIPE_TRAP_E);
+		enables &= ~__IPIPE_TRAP_E;
+	} else {
+		IPIPE_WARN(enables & __IPIPE_KEVENT_E);
+		enables &= ~__IPIPE_KEVENT_E;
+	}
+
+	flags = ipipe_critical_enter(NULL);
+
+	for_each_online_cpu(cpu) {
+		p = ipipe_percpu_context(ipd, cpu);
+		p->coflags &= ~__IPIPE_ALL_E;
+		p->coflags |= enables;
+	}
+
+	wait = (enables ^ __IPIPE_ALL_E) << __IPIPE_SHIFT_R;
+	if (wait == 0 || !__ipipe_root_p) {
+		ipipe_critical_exit(flags);
+		return;
+	}
+
+	ipipe_this_cpu_context(ipd)->coflags &= ~wait;
+
+	ipipe_critical_exit(flags);
+
+	/*
+	 * In case we cleared some hooks over the root domain, we have
+	 * to wait for any ongoing execution to finish, since our
+	 * caller might subsequently unmap the target domain code.
+	 *
+	 * We synchronize with the relevant __ipipe_notify_*()
+	 * helpers, disabling all hooks before we start waiting for
+	 * completion on all CPUs.
+	 */
+	for_each_online_cpu(cpu) {
+		while (ipipe_percpu_context(ipd, cpu)->coflags & wait)
+			schedule_timeout_interruptible(HZ / 50);
+	}
+}
+EXPORT_SYMBOL_GPL(ipipe_set_hooks);
+
+int __weak ipipe_fastcall_hook(struct pt_regs *regs)
+{
+	return -1;	/* i.e. fall back to slow path. */
+}
+
+int __weak ipipe_syscall_hook(struct ipipe_domain *ipd, struct pt_regs *regs)
+{
+	return 0;
+}
+
+static inline void sync_root_irqs(void)
+{
+	struct ipipe_percpu_domain_data *p;
+	unsigned long flags;
+
+	flags = hard_local_irq_save();
+
+	p = ipipe_this_cpu_root_context();
+	if (unlikely(__ipipe_ipending_p(p)))
+		__ipipe_sync_stage();
+
+	hard_local_irq_restore(flags);
+}
+
+int ipipe_handle_syscall(struct thread_info *ti,
+			 unsigned long nr, struct pt_regs *regs)
+{
+	unsigned long local_flags = READ_ONCE(ti->ipipe_flags);
+	unsigned int nr_syscalls = ipipe_root_nr_syscalls(ti);
+	int ret;
+
+	/*
+	 * NOTE: This is a backport from the DOVETAIL syscall
+	 * redirector to the older pipeline implementation.
+	 *
+	 * ==
+	 *
+	 * If the syscall # is out of bounds and the current IRQ stage
+	 * is not the root one, this has to be a non-native system
+	 * call handled by some co-kernel on the head stage. Hand it
+	 * over to the head stage via the fast syscall handler.
+	 *
+	 * Otherwise, if the system call is out of bounds or the
+	 * current thread is shared with a co-kernel, hand the syscall
+	 * over to the latter through the pipeline stages. This
+	 * allows:
+	 *
+	 * - the co-kernel to receive the initial - foreign - syscall
+	 * a thread should send for enabling syscall handling by the
+	 * co-kernel.
+	 *
+	 * - the co-kernel to manipulate the current execution stage
+	 * for handling the request, which includes switching the
+	 * current thread back to the root stage if the syscall is a
+	 * native one, or promoting it to the head stage if handling
+	 * the foreign syscall requires this.
+	 *
+	 * Native syscalls from regular (non-pipeline) threads are
+	 * ignored by this routine, and flow down to the regular
+	 * system call handler.
+	 */
+
+	if (nr >= nr_syscalls && (local_flags & _TIP_HEAD)) {
+		ipipe_fastcall_hook(regs);
+		local_flags = READ_ONCE(ti->ipipe_flags);
+		if (local_flags & _TIP_HEAD) {
+			if (local_flags &  _TIP_MAYDAY)
+				__ipipe_call_mayday(regs);
+			return 1; /* don't pass down, no tail work. */
+		} else {
+			sync_root_irqs();
+			return -1; /* don't pass down, do tail work. */
+		}
+	}
+
+	if ((local_flags & _TIP_NOTIFY) || nr >= nr_syscalls) {
+		ret =__ipipe_notify_syscall(regs);
+		local_flags = READ_ONCE(ti->ipipe_flags);
+		if (local_flags & _TIP_HEAD)
+			return 1; /* don't pass down, no tail work. */
+		if (ret)
+			return -1; /* don't pass down, do tail work. */
+	}
+
+	return 0; /* pass syscall down to the host. */
+}
+
+int __ipipe_notify_syscall(struct pt_regs *regs)
+{
+	struct ipipe_domain *caller_domain, *this_domain, *ipd;
+	struct ipipe_percpu_domain_data *p;
+	unsigned long flags;
+	int ret = 0;
+
+	/*
+	 * We should definitely not pipeline a syscall with IRQs off.
+	 */
+	IPIPE_WARN_ONCE(hard_irqs_disabled());
+
+	flags = hard_local_irq_save();
+	caller_domain = this_domain = __ipipe_current_domain;
+	ipd = ipipe_head_domain;
+next:
+	p = ipipe_this_cpu_context(ipd);
+	if (likely(p->coflags & __IPIPE_SYSCALL_E)) {
+		__ipipe_set_current_context(p);
+		p->coflags |= __IPIPE_SYSCALL_R;
+		hard_local_irq_restore(flags);
+		ret = ipipe_syscall_hook(caller_domain, regs);
+		flags = hard_local_irq_save();
+		p->coflags &= ~__IPIPE_SYSCALL_R;
+		if (__ipipe_current_domain != ipd)
+			/* Account for domain migration. */
+			this_domain = __ipipe_current_domain;
+		else
+			__ipipe_set_current_domain(this_domain);
+	}
+
+	if (this_domain == ipipe_root_domain) {
+		if (ipd != ipipe_root_domain && ret == 0) {
+			ipd = ipipe_root_domain;
+			goto next;
+		}
+		/*
+		 * Careful: we may have migrated from head->root, so p
+		 * would be ipipe_this_cpu_context(head).
+		 */
+		p = ipipe_this_cpu_root_context();
+		if (__ipipe_ipending_p(p))
+			__ipipe_sync_stage();
+	} else if (ipipe_test_thread_flag(TIP_MAYDAY))
+		__ipipe_call_mayday(regs);
+
+	hard_local_irq_restore(flags);
+
+	return ret;
+}
+
+int __weak ipipe_trap_hook(struct ipipe_trap_data *data)
+{
+	return 0;
+}
+
+int __ipipe_notify_trap(int exception, struct pt_regs *regs)
+{
+	struct ipipe_percpu_domain_data *p;
+	struct ipipe_trap_data data;
+	unsigned long flags;
+	int ret = 0;
+
+	flags = hard_local_irq_save();
+
+	/*
+	 * We send a notification about all traps raised over a
+	 * registered head domain only.
+	 */
+	if (__ipipe_root_p)
+		goto out;
+
+	p = ipipe_this_cpu_head_context();
+	if (likely(p->coflags & __IPIPE_TRAP_E)) {
+		p->coflags |= __IPIPE_TRAP_R;
+		hard_local_irq_restore(flags);
+		data.exception = exception;
+		data.regs = regs;
+		ret = ipipe_trap_hook(&data);
+		flags = hard_local_irq_save();
+		p->coflags &= ~__IPIPE_TRAP_R;
+	}
+out:
+	hard_local_irq_restore(flags);
+
+	return ret;
+}
+
+int __ipipe_notify_user_intreturn(void)
+{
+	__ipipe_notify_kevent(IPIPE_KEVT_USERINTRET, current);
+
+	return !ipipe_root_p;
+}
+
+int __weak ipipe_kevent_hook(int kevent, void *data)
+{
+	return 0;
+}
+
+int __ipipe_notify_kevent(int kevent, void *data)
+{
+	struct ipipe_percpu_domain_data *p;
+	unsigned long flags;
+	int ret = 0;
+
+	ipipe_root_only();
+
+	flags = hard_local_irq_save();
+
+	p = ipipe_this_cpu_root_context();
+	if (likely(p->coflags & __IPIPE_KEVENT_E)) {
+		p->coflags |= __IPIPE_KEVENT_R;
+		hard_local_irq_restore(flags);
+		ret = ipipe_kevent_hook(kevent, data);
+		flags = hard_local_irq_save();
+		p->coflags &= ~__IPIPE_KEVENT_R;
+	}
+
+	hard_local_irq_restore(flags);
+
+	return ret;
+}
+
+void __weak ipipe_migration_hook(struct task_struct *p)
+{
+}
+
+static void complete_domain_migration(void) /* hw IRQs off */
+{
+	struct ipipe_percpu_domain_data *p;
+	struct ipipe_percpu_data *pd;
+	struct task_struct *t;
+
+	ipipe_root_only();
+	pd = raw_cpu_ptr(&ipipe_percpu);
+	t = pd->task_hijacked;
+	if (t == NULL)
+		return;
+
+	pd->task_hijacked = NULL;
+	t->state &= ~TASK_HARDENING;
+	if (t->state != TASK_INTERRUPTIBLE)
+		/* Migration aborted (by signal). */
+		return;
+
+	ipipe_set_ti_thread_flag(task_thread_info(t), TIP_HEAD);
+	p = ipipe_this_cpu_head_context();
+	IPIPE_WARN_ONCE(test_bit(IPIPE_STALL_FLAG, &p->status));
+	/*
+	 * hw IRQs are disabled, but the completion hook assumes the
+	 * head domain is logically stalled: fix it up.
+	 */
+	__set_bit(IPIPE_STALL_FLAG, &p->status);
+	ipipe_migration_hook(t);
+	__clear_bit(IPIPE_STALL_FLAG, &p->status);
+	if (__ipipe_ipending_p(p))
+		__ipipe_sync_pipeline(p->domain);
+}
+
+void __ipipe_complete_domain_migration(void)
+{
+	unsigned long flags;
+
+	flags = hard_local_irq_save();
+	complete_domain_migration();
+	hard_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(__ipipe_complete_domain_migration);
+
+int __ipipe_switch_tail(void)
+{
+	int x;
+
+#ifdef CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH
+	hard_local_irq_disable();
+#endif
+	x = __ipipe_root_p;
+	if (x)
+		complete_domain_migration();
+
+#ifndef CONFIG_IPIPE_WANT_PREEMPTIBLE_SWITCH
+	if (x)
+#endif
+		hard_local_irq_enable();
+
+	return !x;
+}
+
+void __ipipe_notify_vm_preemption(void)
+{
+	struct ipipe_vm_notifier *vmf;
+	struct ipipe_percpu_data *p;
+
+	ipipe_check_irqoff();
+	p = __ipipe_raw_cpu_ptr(&ipipe_percpu);
+	vmf = p->vm_notifier;
+	if (unlikely(vmf))
+		vmf->handler(vmf);
+}
+EXPORT_SYMBOL_GPL(__ipipe_notify_vm_preemption);
+
+static void dispatch_irq_head(unsigned int irq) /* hw interrupts off */
+{
+	struct ipipe_percpu_domain_data *p = ipipe_this_cpu_head_context(), *old;
+	struct ipipe_domain *head = p->domain;
+
+	if (unlikely(test_bit(IPIPE_STALL_FLAG, &p->status))) {
+		__ipipe_set_irq_pending(head, irq);
+		return;
+	}
+
+	/* Switch to the head domain if not current. */
+	old = __ipipe_current_context;
+	if (old != p)
+		__ipipe_set_current_context(p);
+
+	p->irqall[irq]++;
+	__set_bit(IPIPE_STALL_FLAG, &p->status);
+	barrier();
+	head->irqs[irq].handler(irq, head->irqs[irq].cookie);
+	__ipipe_run_irqtail(irq);
+	hard_local_irq_disable();
+	p = ipipe_this_cpu_head_context();
+	__clear_bit(IPIPE_STALL_FLAG, &p->status);
+
+	/* Are we still running in the head domain? */
+	if (likely(__ipipe_current_context == p)) {
+		/* Did we enter this code over the head domain? */
+		if (old->domain == head) {
+			/* Yes, do immediate synchronization. */
+			if (__ipipe_ipending_p(p))
+				__ipipe_sync_stage();
+			return;
+		}
+		__ipipe_set_current_context(ipipe_this_cpu_root_context());
+	}
+
+	/*
+	 * We must be running over the root domain, synchronize
+	 * the pipeline for high priority IRQs (slow path).
+	 */
+	__ipipe_do_sync_pipeline(head);
+}
+
+void __ipipe_dispatch_irq(unsigned int irq, int flags) /* hw interrupts off */
+{
+	struct ipipe_domain *ipd;
+	struct irq_desc *desc;
+	unsigned long control;
+	int chained_irq;
+
+	/*
+	 * Survival kit when reading this code:
+	 *
+	 * - we have two main situations, leading to three cases for
+	 *   handling interrupts:
+	 *
+	 *   a) the root domain is alone, no registered head domain
+	 *      => all interrupts go through the interrupt log
+	 *   b) a head domain is registered
+	 *      => head domain IRQs go through the fast dispatcher
+	 *      => root domain IRQs go through the interrupt log
+	 *
+	 * - when no head domain is registered, ipipe_head_domain ==
+	 *   ipipe_root_domain == &ipipe_root.
+	 *
+	 * - the caller tells us whether we should acknowledge this
+	 *   IRQ. Even virtual IRQs may require acknowledge on some
+	 *   platforms (e.g. arm/SMP).
+	 *
+	 * - the caller tells us whether we may try to run the IRQ log
+	 *   syncer. Typically, demuxed IRQs won't be synced
+	 *   immediately.
+	 *
+	 * - multiplex IRQs most likely have a valid acknowledge
+	 *   handler and we may not be called with IPIPE_IRQF_NOACK
+	 *   for them. The ack handler for the multiplex IRQ actually
+	 *   decodes the demuxed interrupts.
+	 */
+
+#ifdef CONFIG_IPIPE_DEBUG
+	if (irq >= IPIPE_NR_IRQS) {
+		pr_err("I-pipe: spurious interrupt %u\n", irq);
+		return;
+	}
+#endif
+	/*
+	 * CAUTION: on some archs, virtual IRQs may have acknowledge
+	 * handlers. Multiplex IRQs should have one too.
+	 */
+	if (unlikely(irq >= IPIPE_NR_XIRQS)) {
+		desc = NULL;
+		chained_irq = 0;
+	} else {
+		desc = irq_to_desc(irq);
+		chained_irq = desc ? ipipe_chained_irq_p(desc) : 0;
+	}
+	if (flags & IPIPE_IRQF_NOACK)
+		IPIPE_WARN_ONCE(chained_irq);
+	else {
+		ipd = ipipe_head_domain;
+		control = ipd->irqs[irq].control;
+		if ((control & IPIPE_HANDLE_MASK) == 0)
+			ipd = ipipe_root_domain;
+		if (ipd->irqs[irq].ackfn)
+			ipd->irqs[irq].ackfn(desc);
+		if (chained_irq) {
+			if ((flags & IPIPE_IRQF_NOSYNC) == 0)
+				/* Run demuxed IRQ handlers. */
+				goto sync;
+			return;
+		}
+	}
+
+	/*
+	 * Sticky interrupts must be handled early and separately, so
+	 * that we always process them on the current domain.
+	 */
+	ipd = __ipipe_current_domain;
+	control = ipd->irqs[irq].control;
+	if (control & IPIPE_STICKY_MASK)
+		goto log;
+
+	/*
+	 * In case we have no registered head domain
+	 * (i.e. ipipe_head_domain == &ipipe_root), we always go
+	 * through the interrupt log, and leave the dispatching work
+	 * ultimately to __ipipe_sync_pipeline().
+	 */
+	ipd = ipipe_head_domain;
+	control = ipd->irqs[irq].control;
+	if (ipd == ipipe_root_domain)
+		/*
+		 * The root domain must handle all interrupts, so
+		 * testing the HANDLE bit would be pointless.
+		 */
+		goto log;
+
+	if (control & IPIPE_HANDLE_MASK) {
+		if (unlikely(flags & IPIPE_IRQF_NOSYNC))
+			__ipipe_set_irq_pending(ipd, irq);
+		else
+			dispatch_irq_head(irq);
+		return;
+	}
+
+	ipd = ipipe_root_domain;
+log:
+	__ipipe_set_irq_pending(ipd, irq);
+
+	if (flags & IPIPE_IRQF_NOSYNC)
+		return;
+
+	/*
+	 * Optimize if we preempted a registered high priority head
+	 * domain: we don't need to synchronize the pipeline unless
+	 * there is a pending interrupt for it.
+	 */
+	if (!__ipipe_root_p &&
+	    !__ipipe_ipending_p(ipipe_this_cpu_head_context()))
+		return;
+sync:
+	__ipipe_sync_pipeline(ipipe_head_domain);
+}
+
+void ipipe_raise_irq(unsigned int irq)
+{
+	struct ipipe_domain *ipd = ipipe_head_domain;
+	unsigned long flags, control;
+
+	flags = hard_local_irq_save();
+
+	/*
+	 * Fast path: raising a virtual IRQ handled by the head
+	 * domain.
+	 */
+	if (likely(ipipe_virtual_irq_p(irq) && ipd != ipipe_root_domain)) {
+		control = ipd->irqs[irq].control;
+		if (likely(control & IPIPE_HANDLE_MASK)) {
+			dispatch_irq_head(irq);
+			goto out;
+		}
+	}
+
+	/* Emulate regular device IRQ receipt. */
+	__ipipe_dispatch_irq(irq, IPIPE_IRQF_NOACK);
+out:
+	hard_local_irq_restore(flags);
+
+}
+EXPORT_SYMBOL_GPL(ipipe_raise_irq);
+
+#ifdef CONFIG_PREEMPT
+
+void preempt_schedule_irq(void);
+
+void __sched __ipipe_preempt_schedule_irq(void)
+{
+	struct ipipe_percpu_domain_data *p;
+	unsigned long flags;
+
+	if (WARN_ON_ONCE(!hard_irqs_disabled()))
+		hard_local_irq_disable();
+
+	local_irq_save(flags);
+	hard_local_irq_enable();
+	preempt_schedule_irq(); /* Ok, may reschedule now. */
+	hard_local_irq_disable();
+
+	/*
+	 * Flush any pending interrupt that may have been logged after
+	 * preempt_schedule_irq() stalled the root stage before
+	 * returning to us, and now.
+	 */
+	p = ipipe_this_cpu_root_context();
+	if (unlikely(__ipipe_ipending_p(p))) {
+		trace_hardirqs_on();
+		__clear_bit(IPIPE_STALL_FLAG, &p->status);
+		__ipipe_sync_stage();
+	}
+
+	__ipipe_restore_root_nosync(flags);
+}
+
+#else /* !CONFIG_PREEMPT */
+
+#define __ipipe_preempt_schedule_irq()	do { } while (0)
+
+#endif	/* !CONFIG_PREEMPT */
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+#define root_stall_after_handler()	local_irq_disable()
+#else
+#define root_stall_after_handler()	do { } while (0)
+#endif
+
+/*
+ * __ipipe_do_sync_stage() -- Flush the pending IRQs for the current
+ * domain (and processor). This routine flushes the interrupt log (see
+ * "Optimistic interrupt protection" from D. Stodolsky et al. for more
+ * on the deferred interrupt scheme). Every interrupt that occurred
+ * while the pipeline was stalled gets played.
+ *
+ * WARNING: CPU migration may occur over this routine.
+ */
+void __ipipe_do_sync_stage(void)
+{
+	struct ipipe_percpu_domain_data *p;
+	struct ipipe_domain *ipd;
+	int irq;
+
+	p = __ipipe_current_context;
+respin:
+	ipd = p->domain;
+
+	__set_bit(IPIPE_STALL_FLAG, &p->status);
+	smp_wmb();
+
+	if (ipd == ipipe_root_domain)
+		trace_hardirqs_off();
+
+	for (;;) {
+		irq = __ipipe_next_irq(p);
+		if (irq < 0)
+			break;
+		/*
+		 * Make sure the compiler does not reorder wrongly, so
+		 * that all updates to maps are done before the
+		 * handler gets called.
+		 */
+		barrier();
+
+		if (test_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control))
+			continue;
+
+		if (ipd != ipipe_head_domain)
+			hard_local_irq_enable();
+
+		if (likely(ipd != ipipe_root_domain)) {
+			ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie);
+			__ipipe_run_irqtail(irq);
+			hard_local_irq_disable();
+		} else if (ipipe_virtual_irq_p(irq)) {
+			irq_enter();
+			ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie);
+			irq_exit();
+			root_stall_after_handler();
+			hard_local_irq_disable();
+		} else {
+			ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie);
+			root_stall_after_handler();
+			hard_local_irq_disable();
+		}
+
+		/*
+		 * We may have migrated to a different CPU (1) upon
+		 * return from the handler, or downgraded from the
+		 * head domain to the root one (2), the opposite way
+		 * is NOT allowed though.
+		 *
+		 * (1) reload the current per-cpu context pointer, so
+		 * that we further pull pending interrupts from the
+		 * proper per-cpu log.
+		 *
+		 * (2) check the stall bit to know whether we may
+		 * dispatch any interrupt pending for the root domain,
+		 * and respin the entire dispatch loop if
+		 * so. Otherwise, immediately return to the caller,
+		 * _without_ affecting the stall state for the root
+		 * domain, since we do not own it at this stage.  This
+		 * case is basically reflecting what may happen in
+		 * dispatch_irq_head() for the fast path.
+		 */
+		p = __ipipe_current_context;
+		if (p->domain != ipd) {
+			IPIPE_BUG_ON(ipd == ipipe_root_domain);
+			if (test_bit(IPIPE_STALL_FLAG, &p->status))
+				return;
+			goto respin;
+		}
+	}
+
+	if (ipd == ipipe_root_domain)
+		trace_hardirqs_on();
+
+	__clear_bit(IPIPE_STALL_FLAG, &p->status);
+}
+
+void __ipipe_call_mayday(struct pt_regs *regs)
+{
+	unsigned long flags;
+
+	ipipe_clear_thread_flag(TIP_MAYDAY);
+	flags = hard_local_irq_save();
+	__ipipe_notify_trap(IPIPE_TRAP_MAYDAY, regs);
+	hard_local_irq_restore(flags);
+}
+
+#ifdef CONFIG_SMP
+
+/* Always called with hw interrupts off. */
+void __ipipe_do_critical_sync(unsigned int irq, void *cookie)
+{
+	int cpu = ipipe_processor_id();
+
+	cpumask_set_cpu(cpu, &__ipipe_cpu_sync_map);
+
+	/*
+	 * Now we are in sync with the lock requestor running on
+	 * another CPU. Enter a spinning wait until he releases the
+	 * global lock.
+	 */
+	raw_spin_lock(&__ipipe_cpu_barrier);
+
+	/* Got it. Now get out. */
+
+	/* Call the sync routine if any. */
+	if (__ipipe_cpu_sync)
+		__ipipe_cpu_sync();
+
+	cpumask_set_cpu(cpu, &__ipipe_cpu_pass_map);
+
+	raw_spin_unlock(&__ipipe_cpu_barrier);
+
+	cpumask_clear_cpu(cpu, &__ipipe_cpu_sync_map);
+}
+#endif	/* CONFIG_SMP */
+
+unsigned long ipipe_critical_enter(void (*syncfn)(void))
+{
+	static cpumask_t allbutself __maybe_unused, online __maybe_unused;
+	int cpu __maybe_unused, n __maybe_unused;
+	unsigned long flags, loops __maybe_unused;
+
+	flags = hard_local_irq_save();
+
+	if (num_online_cpus() == 1)
+		return flags;
+
+#ifdef CONFIG_SMP
+
+	cpu = ipipe_processor_id();
+	if (!cpumask_test_and_set_cpu(cpu, &__ipipe_cpu_lock_map)) {
+		while (test_and_set_bit(0, &__ipipe_critical_lock)) {
+			n = 0;
+			hard_local_irq_enable();
+
+			do
+				cpu_relax();
+			while (++n < cpu);
+
+			hard_local_irq_disable();
+		}
+restart:
+		online = *cpu_online_mask;
+		raw_spin_lock(&__ipipe_cpu_barrier);
+
+		__ipipe_cpu_sync = syncfn;
+
+		cpumask_clear(&__ipipe_cpu_pass_map);
+		cpumask_set_cpu(cpu, &__ipipe_cpu_pass_map);
+
+		/*
+		 * Send the sync IPI to all processors but the current
+		 * one.
+		 */
+		cpumask_andnot(&allbutself, &online, &__ipipe_cpu_pass_map);
+		ipipe_send_ipi(IPIPE_CRITICAL_IPI, allbutself);
+		loops = IPIPE_CRITICAL_TIMEOUT;
+
+		while (!cpumask_equal(&__ipipe_cpu_sync_map, &allbutself)) {
+			if (--loops > 0) {
+				cpu_relax();
+				continue;
+			}
+			/*
+			 * We ran into a deadlock due to a contended
+			 * rwlock. Cancel this round and retry.
+			 */
+			__ipipe_cpu_sync = NULL;
+
+			raw_spin_unlock(&__ipipe_cpu_barrier);
+			/*
+			 * Ensure all CPUs consumed the IPI to avoid
+			 * running __ipipe_cpu_sync prematurely. This
+			 * usually resolves the deadlock reason too.
+			 */
+			while (!cpumask_equal(&online, &__ipipe_cpu_pass_map))
+				cpu_relax();
+
+			goto restart;
+		}
+	}
+
+	atomic_inc(&__ipipe_critical_count);
+
+#endif	/* CONFIG_SMP */
+
+	return flags;
+}
+EXPORT_SYMBOL_GPL(ipipe_critical_enter);
+
+void ipipe_critical_exit(unsigned long flags)
+{
+	if (num_online_cpus() == 1) {
+		hard_local_irq_restore(flags);
+		return;
+	}
+
+#ifdef CONFIG_SMP
+	if (atomic_dec_and_test(&__ipipe_critical_count)) {
+		raw_spin_unlock(&__ipipe_cpu_barrier);
+		while (!cpumask_empty(&__ipipe_cpu_sync_map))
+			cpu_relax();
+		cpumask_clear_cpu(ipipe_processor_id(), &__ipipe_cpu_lock_map);
+		clear_bit(0, &__ipipe_critical_lock);
+		smp_mb__after_atomic();
+	}
+#endif /* CONFIG_SMP */
+
+	hard_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(ipipe_critical_exit);
+
+#ifdef CONFIG_IPIPE_DEBUG_CONTEXT
+
+void ipipe_root_only(void)
+{
+	struct ipipe_domain *this_domain;
+	unsigned long flags;
+
+	flags = hard_smp_local_irq_save();
+
+	this_domain = __ipipe_current_domain;
+	if (likely(this_domain == ipipe_root_domain &&
+		   !test_bit(IPIPE_STALL_FLAG, &__ipipe_head_status))) {
+		hard_smp_local_irq_restore(flags);
+		return;
+	}
+
+	if (!__this_cpu_read(ipipe_percpu.context_check)) {
+		hard_smp_local_irq_restore(flags);
+		return;
+	}
+
+	hard_smp_local_irq_restore(flags);
+
+	ipipe_prepare_panic();
+	ipipe_trace_panic_freeze();
+
+	if (this_domain != ipipe_root_domain)
+		pr_err("I-pipe: Detected illicit call from head domain '%s'\n"
+		       "        into a regular Linux service\n",
+		       this_domain->name);
+	else
+		pr_err("I-pipe: Detected stalled head domain, "
+			"probably caused by a bug.\n"
+			"        A critical section may have been "
+			"left unterminated.\n");
+	dump_stack();
+	ipipe_trace_panic_dump();
+}
+EXPORT_SYMBOL(ipipe_root_only);
+
+#endif /* CONFIG_IPIPE_DEBUG_CONTEXT */
+
+#if defined(CONFIG_IPIPE_DEBUG_INTERNAL) && defined(CONFIG_SMP)
+
+unsigned long notrace __ipipe_cpu_get_offset(void)
+{
+	struct ipipe_domain *this_domain;
+	unsigned long flags;
+	bool bad = false;
+
+	flags = hard_local_irq_save_notrace();
+	if (raw_irqs_disabled_flags(flags))
+		goto out;
+
+	/*
+	 * Only the root domain may implement preemptive CPU migration
+	 * of tasks, so anything above in the pipeline should be fine.
+	 * CAUTION: we want open coded access to the current domain,
+	 * don't use __ipipe_current_domain here, this would recurse
+	 * indefinitely.
+	 */
+	this_domain = raw_cpu_read(ipipe_percpu.curr)->domain;
+	if (this_domain != ipipe_root_domain)
+		goto out;
+
+	/*
+	 * Since we run on the root stage with hard irqs enabled, we
+	 * need preemption to be disabled.  Otherwise, our caller may
+	 * end up accessing the wrong per-cpu variable instance due to
+	 * CPU migration, complain loudly.
+	 */
+	if (preempt_count() == 0 && !irqs_disabled())
+		bad = true;
+out:
+	hard_local_irq_restore_notrace(flags);
+
+	WARN_ON_ONCE(bad);
+
+	return __my_cpu_offset;
+}
+EXPORT_SYMBOL(__ipipe_cpu_get_offset);
+
+void __ipipe_spin_unlock_debug(unsigned long flags)
+{
+	/*
+	 * We catch a nasty issue where spin_unlock_irqrestore() on a
+	 * regular kernel spinlock is about to re-enable hw interrupts
+	 * in a section entered with hw irqs off. This is clearly the
+	 * sign of a massive breakage coming. Usual suspect is a
+	 * regular spinlock which was overlooked, used within a
+	 * section which must run with hw irqs disabled.
+	 */
+	IPIPE_WARN_ONCE(!raw_irqs_disabled_flags(flags) && hard_irqs_disabled());
+}
+EXPORT_SYMBOL(__ipipe_spin_unlock_debug);
+
+#endif /* CONFIG_IPIPE_DEBUG_INTERNAL && CONFIG_SMP */
+
+void ipipe_prepare_panic(void)
+{
+#ifdef CONFIG_PRINTK
+	__ipipe_printk_bypass = 1;
+#endif
+	ipipe_context_check_off();
+}
+EXPORT_SYMBOL_GPL(ipipe_prepare_panic);
+
+static void __ipipe_do_work(unsigned int virq, void *cookie)
+{
+	struct ipipe_work_header *work;
+	unsigned long flags;
+	void *curr, *tail;
+	int cpu;
+
+	/*
+	 * Work is dispatched in enqueuing order. This interrupt
+	 * context can't migrate to another CPU.
+	 */
+	cpu = smp_processor_id();
+	curr = per_cpu(work_buf, cpu);
+
+	for (;;) {
+		flags = hard_local_irq_save();
+		tail = per_cpu(work_tail, cpu);
+		if (curr == tail) {
+			per_cpu(work_tail, cpu) = per_cpu(work_buf, cpu);
+			hard_local_irq_restore(flags);
+			return;
+		}
+		work = curr;
+		curr += work->size;
+		hard_local_irq_restore(flags);
+		work->handler(work);
+	}
+}
+
+void __ipipe_post_work_root(struct ipipe_work_header *work)
+{
+	unsigned long flags;
+	void *tail;
+	int cpu;
+
+	/*
+	 * Subtle: we want to use the head stall/unstall operators,
+	 * not the hard_* routines to protect against races. This way,
+	 * we ensure that a root-based caller will trigger the virq
+	 * handling immediately when unstalling the head stage, as a
+	 * result of calling __ipipe_sync_pipeline() under the hood.
+	 */
+	flags = ipipe_test_and_stall_head();
+	cpu = ipipe_processor_id();
+	tail = per_cpu(work_tail, cpu);
+
+	if (WARN_ON_ONCE((unsigned char *)tail + work->size >=
+			 per_cpu(work_buf, cpu) + WORKBUF_SIZE))
+		goto out;
+
+	/* Work handling is deferred, so data has to be copied. */
+	memcpy(tail, work, work->size);
+	per_cpu(work_tail, cpu) = tail + work->size;
+	ipipe_post_irq_root(__ipipe_work_virq);
+out:
+	ipipe_restore_head(flags);
+}
+EXPORT_SYMBOL_GPL(__ipipe_post_work_root);
+
+void __weak __ipipe_arch_share_current(int flags)
+{
+}
+
+void __ipipe_share_current(int flags)
+{
+	ipipe_root_only();
+
+	__ipipe_arch_share_current(flags);
+}
+EXPORT_SYMBOL_GPL(__ipipe_share_current);
+
+bool __weak ipipe_cpuidle_control(struct cpuidle_device *dev,
+				  struct cpuidle_state *state)
+{
+	/*
+	 * By default, always deny entering sleep state if this
+	 * entails stopping the timer (i.e. C3STOP misfeature),
+	 * Xenomai could not deal with this case.
+	 */
+	if (state && (state->flags & CPUIDLE_FLAG_TIMER_STOP))
+		return false;
+
+	/* Otherwise, allow switching to idle state. */
+	return true;
+}
+
+bool ipipe_enter_cpuidle(struct cpuidle_device *dev,
+			 struct cpuidle_state *state)
+{
+	struct ipipe_percpu_domain_data *p;
+
+	WARN_ON_ONCE(!irqs_disabled());
+
+	hard_local_irq_disable();
+	p = ipipe_this_cpu_root_context();
+
+	/*
+	 * Pending IRQ(s) waiting for delivery to the root stage, or
+	 * the arbitrary decision of a co-kernel may deny the
+	 * transition to a deeper C-state. Note that we return from
+	 * this call with hard irqs off, so that we won't allow any
+	 * interrupt to sneak into the IRQ log until we reach the
+	 * processor idling code, or leave the CPU idle framework
+	 * without sleeping.
+	 */
+	return !__ipipe_ipending_p(p) && ipipe_cpuidle_control(dev, state);
+}
+
+#if defined(CONFIG_DEBUG_ATOMIC_SLEEP) || defined(CONFIG_PROVE_LOCKING) || \
+	defined(CONFIG_PREEMPT_VOLUNTARY) || defined(CONFIG_IPIPE_DEBUG_CONTEXT)
+void __ipipe_uaccess_might_fault(void)
+{
+	struct ipipe_percpu_domain_data *pdd;
+	struct ipipe_domain *ipd;
+	unsigned long flags;
+
+	flags = hard_local_irq_save();
+	ipd = __ipipe_current_domain;
+	if (ipd == ipipe_root_domain) {
+		hard_local_irq_restore(flags);
+		might_fault();
+		return;
+	}
+
+#ifdef CONFIG_IPIPE_DEBUG_CONTEXT
+	pdd = ipipe_this_cpu_context(ipd);
+	WARN_ON_ONCE(hard_irqs_disabled_flags(flags)
+		     || test_bit(IPIPE_STALL_FLAG, &pdd->status));
+#else /* !CONFIG_IPIPE_DEBUG_CONTEXT */
+	(void)pdd;
+#endif /* !CONFIG_IPIPE_DEBUG_CONTEXT */
+	hard_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(__ipipe_uaccess_might_fault);
+#endif
diff --git a/kernel/ipipe/timer.c b/kernel/ipipe/timer.c
new file mode 100644
index 000000000000..83b3c94d92b3
--- /dev/null
+++ b/kernel/ipipe/timer.c
@@ -0,0 +1,656 @@
+/* -*- linux-c -*-
+ * linux/kernel/ipipe/timer.c
+ *
+ * Copyright (C) 2012 Gilles Chanteperdrix
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * I-pipe timer request interface.
+ */
+#include <linux/ipipe.h>
+#include <linux/percpu.h>
+#include <linux/irqdesc.h>
+#include <linux/cpumask.h>
+#include <linux/spinlock.h>
+#include <linux/ipipe_tickdev.h>
+#include <linux/interrupt.h>
+#include <linux/export.h>
+
+unsigned long __ipipe_hrtimer_freq;
+
+static LIST_HEAD(timers);
+static IPIPE_DEFINE_SPINLOCK(lock);
+
+static DEFINE_PER_CPU(struct ipipe_timer *, percpu_timer);
+
+/*
+ * Default request method: switch to oneshot mode if supported.
+ */
+static void ipipe_timer_default_request(struct ipipe_timer *timer, int steal)
+{
+	struct clock_event_device *evtdev = timer->host_timer;
+
+	if (!(evtdev->features & CLOCK_EVT_FEAT_ONESHOT))
+		return;
+
+	if (clockevent_state_oneshot(evtdev) ||
+		clockevent_state_oneshot_stopped(evtdev))
+		timer->orig_mode = CLOCK_EVT_MODE_ONESHOT;
+	else {
+		if (clockevent_state_periodic(evtdev))
+			timer->orig_mode = CLOCK_EVT_MODE_PERIODIC;
+		else if (clockevent_state_shutdown(evtdev))
+			timer->orig_mode = CLOCK_EVT_MODE_SHUTDOWN;
+		else
+			timer->orig_mode = CLOCK_EVT_MODE_UNUSED;
+		evtdev->set_state_oneshot(evtdev);
+		evtdev->set_next_event(timer->freq / HZ, evtdev);
+	}
+}
+
+/*
+ * Default release method: return the timer to the mode it had when
+ * starting.
+ */
+static void ipipe_timer_default_release(struct ipipe_timer *timer)
+{
+	struct clock_event_device *evtdev = timer->host_timer;
+
+	switch (timer->orig_mode) {
+	case CLOCK_EVT_MODE_SHUTDOWN:
+		evtdev->set_state_shutdown(evtdev);
+		break;
+	case CLOCK_EVT_MODE_PERIODIC:
+		evtdev->set_state_periodic(evtdev);
+		/* fall through */
+	case CLOCK_EVT_MODE_ONESHOT:
+		evtdev->set_next_event(timer->freq / HZ, evtdev);
+		break;
+	}
+}
+
+static int get_dev_mode(struct clock_event_device *evtdev)
+{
+	if (clockevent_state_oneshot(evtdev) ||
+		clockevent_state_oneshot_stopped(evtdev))
+		return CLOCK_EVT_MODE_ONESHOT;
+
+	if (clockevent_state_periodic(evtdev))
+		return CLOCK_EVT_MODE_PERIODIC;
+
+	if (clockevent_state_shutdown(evtdev))
+		return CLOCK_EVT_MODE_SHUTDOWN;
+
+	return CLOCK_EVT_MODE_UNUSED;
+}
+
+void ipipe_host_timer_register(struct clock_event_device *evtdev)
+{
+	struct ipipe_timer *timer = evtdev->ipipe_timer;
+
+	if (timer == NULL)
+		return;
+
+	timer->orig_mode = CLOCK_EVT_MODE_UNUSED;
+
+	if (timer->request == NULL)
+		timer->request = ipipe_timer_default_request;
+
+	/*
+	 * By default, use the same method as linux timer, on ARM at
+	 * least, most set_next_event methods are safe to be called
+	 * from Xenomai domain anyway.
+	 */
+	if (timer->set == NULL) {
+		timer->timer_set = evtdev;
+		timer->set = (typeof(timer->set))evtdev->set_next_event;
+	}
+
+	if (timer->release == NULL)
+		timer->release = ipipe_timer_default_release;
+
+	if (timer->name == NULL)
+		timer->name = evtdev->name;
+
+	if (timer->rating == 0)
+		timer->rating = evtdev->rating;
+
+	timer->freq = (1000000000ULL * evtdev->mult) >> evtdev->shift;
+
+	if (timer->min_delay_ticks == 0)
+		timer->min_delay_ticks =
+			(evtdev->min_delta_ns * evtdev->mult) >> evtdev->shift;
+
+	if (timer->max_delay_ticks == 0)
+		timer->max_delay_ticks =
+			(evtdev->max_delta_ns * evtdev->mult) >> evtdev->shift;
+
+	if (timer->cpumask == NULL)
+		timer->cpumask = evtdev->cpumask;
+
+	timer->host_timer = evtdev;
+
+	ipipe_timer_register(timer);
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+void ipipe_host_timer_cleanup(struct clock_event_device *evtdev)
+{
+	struct ipipe_timer *timer = evtdev->ipipe_timer;
+	unsigned long flags;
+
+	if (timer == NULL)
+		return;
+
+	raw_spin_lock_irqsave(&lock, flags);
+	list_del(&timer->link);
+	raw_spin_unlock_irqrestore(&lock, flags);
+}
+#endif /* CONFIG_HOTPLUG_CPU */
+
+/*
+ * register a timer: maintain them in a list sorted by rating
+ */
+void ipipe_timer_register(struct ipipe_timer *timer)
+{
+	struct ipipe_timer *t;
+	unsigned long flags;
+
+	if (timer->timer_set == NULL)
+		timer->timer_set = timer;
+
+	if (timer->cpumask == NULL)
+		timer->cpumask = cpumask_of(smp_processor_id());
+
+	raw_spin_lock_irqsave(&lock, flags);
+
+	list_for_each_entry(t, &timers, link) {
+		if (t->rating <= timer->rating) {
+			__list_add(&timer->link, t->link.prev, &t->link);
+			goto done;
+		}
+	}
+	list_add_tail(&timer->link, &timers);
+  done:
+	raw_spin_unlock_irqrestore(&lock, flags);
+}
+
+static void ipipe_timer_request_sync(void)
+{
+	struct ipipe_timer *timer = __ipipe_raw_cpu_read(percpu_timer);
+	struct clock_event_device *evtdev;
+	int steal;
+
+	if (!timer)
+		return;
+
+	evtdev = timer->host_timer;
+	steal = evtdev != NULL && !clockevent_state_detached(evtdev);
+	timer->request(timer, steal);
+}
+
+static void config_pcpu_timer(struct ipipe_timer *t, unsigned hrclock_freq)
+{
+	unsigned long long tmp;
+	unsigned hrtimer_freq;
+
+	if (__ipipe_hrtimer_freq != t->freq)
+		__ipipe_hrtimer_freq = t->freq;
+
+	hrtimer_freq = t->freq;
+	if (__ipipe_hrclock_freq > UINT_MAX)
+		hrtimer_freq /= 1000;
+
+	t->c2t_integ = hrtimer_freq / hrclock_freq;
+	tmp = (((unsigned long long)
+		(hrtimer_freq % hrclock_freq)) << 32)
+		+ hrclock_freq - 1;
+	do_div(tmp, hrclock_freq);
+	t->c2t_frac = tmp;
+}
+
+/* Set up a timer as per-cpu timer for ipipe */
+static void install_pcpu_timer(unsigned cpu, unsigned hrclock_freq,
+			      struct ipipe_timer *t)
+{
+	per_cpu(ipipe_percpu.hrtimer_irq, cpu) = t->irq;
+	per_cpu(percpu_timer, cpu) = t;
+	config_pcpu_timer(t, hrclock_freq);
+}
+
+static void select_root_only_timer(unsigned cpu, unsigned hrclock_khz,
+				   const struct cpumask *mask,
+				   struct ipipe_timer *t) {
+	unsigned icpu;
+	struct clock_event_device *evtdev;
+
+	/*
+	 * If no ipipe-supported CPU shares an interrupt with the
+	 * timer, we do not need to care about it.
+	 */
+	for_each_cpu(icpu, mask) {
+		if (t->irq == per_cpu(ipipe_percpu.hrtimer_irq, icpu)) {
+			evtdev = t->host_timer;
+			if (evtdev && clockevent_state_shutdown(evtdev))
+				continue;
+			goto found;
+		}
+	}
+
+	return;
+
+found:
+	install_pcpu_timer(cpu, hrclock_khz, t);
+}
+
+/*
+ * Choose per-cpu timers with the highest rating by traversing the
+ * rating-sorted list for each CPU.
+ */
+int ipipe_select_timers(const struct cpumask *mask)
+{
+	unsigned hrclock_freq;
+	unsigned long long tmp;
+	struct ipipe_timer *t;
+	struct clock_event_device *evtdev;
+	unsigned long flags;
+	unsigned cpu;
+	cpumask_var_t fixup;
+
+	if (!__ipipe_hrclock_ok()) {
+		printk("I-pipe: high-resolution clock not working\n");
+		return -ENODEV;
+	}
+
+	if (__ipipe_hrclock_freq > UINT_MAX) {
+		tmp = __ipipe_hrclock_freq;
+		do_div(tmp, 1000);
+		hrclock_freq = tmp;
+	} else
+		hrclock_freq = __ipipe_hrclock_freq;
+
+
+	if (!zalloc_cpumask_var(&fixup, GFP_KERNEL)) {
+		WARN_ON(1);
+		return -ENODEV;
+	}
+
+	raw_spin_lock_irqsave(&lock, flags);
+
+	/* First, choose timers for the CPUs handled by ipipe */
+	for_each_cpu(cpu, mask) {
+		list_for_each_entry(t, &timers, link) {
+			if (!cpumask_test_cpu(cpu, t->cpumask))
+				continue;
+
+			evtdev = t->host_timer;
+			if (evtdev && clockevent_state_shutdown(evtdev))
+				continue;
+			goto found;
+		}
+
+		printk("I-pipe: could not find timer for cpu #%d\n",
+		       cpu);
+		goto err_remove_all;
+found:
+		install_pcpu_timer(cpu, hrclock_freq, t);
+	}
+
+	/*
+	 * Second, check if we need to fix up any CPUs not supported
+	 * by ipipe (but by Linux) whose interrupt may need to be
+	 * forwarded because they have the same IRQ as an ipipe-enabled
+	 * timer.
+	 */
+	cpumask_andnot(fixup, cpu_online_mask, mask);
+
+	for_each_cpu(cpu, fixup) {
+		list_for_each_entry(t, &timers, link) {
+			if (!cpumask_test_cpu(cpu, t->cpumask))
+				continue;
+
+			select_root_only_timer(cpu, hrclock_freq, mask, t);
+		}
+	}
+
+	raw_spin_unlock_irqrestore(&lock, flags);
+
+	free_cpumask_var(fixup);
+	flags = ipipe_critical_enter(ipipe_timer_request_sync);
+	ipipe_timer_request_sync();
+	ipipe_critical_exit(flags);
+
+	return 0;
+
+err_remove_all:
+	raw_spin_unlock_irqrestore(&lock, flags);
+	free_cpumask_var(fixup);
+
+	for_each_cpu(cpu, mask) {
+		per_cpu(ipipe_percpu.hrtimer_irq, cpu) = -1;
+		per_cpu(percpu_timer, cpu) = NULL;
+	}
+	__ipipe_hrtimer_freq = 0;
+
+	return -ENODEV;
+}
+
+static void ipipe_timer_release_sync(void)
+{
+	struct ipipe_timer *timer = __ipipe_raw_cpu_read(percpu_timer);
+
+	if (timer)
+		timer->release(timer);
+}
+
+void ipipe_timers_release(void)
+{
+	unsigned long flags;
+	unsigned cpu;
+
+	flags = ipipe_critical_enter(ipipe_timer_release_sync);
+	ipipe_timer_release_sync();
+	ipipe_critical_exit(flags);
+
+	for_each_online_cpu(cpu) {
+		per_cpu(ipipe_percpu.hrtimer_irq, cpu) = -1;
+		per_cpu(percpu_timer, cpu) = NULL;
+		__ipipe_hrtimer_freq = 0;
+	}
+}
+
+static void __ipipe_ack_hrtimer_irq(struct irq_desc *desc)
+{
+	struct ipipe_timer *timer = __ipipe_raw_cpu_read(percpu_timer);
+
+	/*
+	 * Pseudo-IRQs like pipelined IPIs have no descriptor, we have
+	 * to check for this.
+	 */
+	if (desc)
+		desc->ipipe_ack(desc);
+
+	if (timer->ack)
+		timer->ack();
+
+	if (desc)
+		desc->ipipe_end(desc);
+}
+
+static int do_set_oneshot(struct clock_event_device *cdev)
+{
+	struct ipipe_timer *timer = __ipipe_raw_cpu_read(percpu_timer);
+
+	timer->orig_set_state_oneshot(cdev);
+	timer->mode_handler(CLOCK_EVT_MODE_ONESHOT, cdev);
+
+	return 0;
+}
+
+static int do_set_oneshot_stopped(struct clock_event_device *cdev)
+{
+	struct ipipe_timer *timer = __ipipe_raw_cpu_read(percpu_timer);
+
+	timer->mode_handler(CLOCK_EVT_MODE_SHUTDOWN, cdev);
+
+	return 0;
+}
+
+static int do_set_periodic(struct clock_event_device *cdev)
+{
+	struct ipipe_timer *timer = __ipipe_raw_cpu_read(percpu_timer);
+
+	timer->mode_handler(CLOCK_EVT_MODE_PERIODIC, cdev);
+
+	return 0;
+}
+
+static int do_set_shutdown(struct clock_event_device *cdev)
+{
+	struct ipipe_timer *timer = __ipipe_raw_cpu_read(percpu_timer);
+
+	timer->mode_handler(CLOCK_EVT_MODE_SHUTDOWN, cdev);
+
+	return 0;
+}
+
+int clockevents_program_event(struct clock_event_device *dev,
+			      ktime_t expires, bool force);
+
+struct grab_timer_data {
+	void (*tick_handler)(void);
+	void (*emumode)(enum clock_event_mode mode,
+			struct clock_event_device *cdev);
+	int (*emutick)(unsigned long evt,
+		       struct clock_event_device *cdev);
+	int retval;
+};
+
+static void grab_timer(void *arg)
+{
+	struct grab_timer_data *data = arg;
+	struct clock_event_device *evtdev;
+	struct ipipe_timer *timer;
+	struct irq_desc *desc;
+	unsigned long flags;
+	int steal, ret;
+
+	flags = hard_local_irq_save();
+
+	timer = this_cpu_read(percpu_timer);
+	evtdev = timer->host_timer;
+	ret = ipipe_request_irq(ipipe_head_domain, timer->irq,
+				(ipipe_irq_handler_t)data->tick_handler,
+				NULL, __ipipe_ack_hrtimer_irq);
+	if (ret < 0 && ret != -EBUSY) {
+		hard_local_irq_restore(flags);
+		data->retval = ret;
+		return;
+	}
+
+	steal = !clockevent_state_detached(evtdev);
+	if (steal && evtdev->ipipe_stolen == 0) {
+		timer->real_mult = evtdev->mult;
+		timer->real_shift = evtdev->shift;
+		timer->orig_set_state_periodic = evtdev->set_state_periodic;
+		timer->orig_set_state_oneshot = evtdev->set_state_oneshot;
+		timer->orig_set_state_oneshot_stopped = evtdev->set_state_oneshot_stopped;
+		timer->orig_set_state_shutdown = evtdev->set_state_shutdown;
+		timer->orig_set_next_event = evtdev->set_next_event;
+		timer->mode_handler = data->emumode;
+		evtdev->mult = 1;
+		evtdev->shift = 0;
+		evtdev->max_delta_ns = UINT_MAX;
+		if (timer->orig_set_state_periodic)
+			evtdev->set_state_periodic = do_set_periodic;
+		if (timer->orig_set_state_oneshot)
+			evtdev->set_state_oneshot = do_set_oneshot;
+		if (timer->orig_set_state_oneshot_stopped)
+			evtdev->set_state_oneshot_stopped = do_set_oneshot_stopped;
+		if (timer->orig_set_state_shutdown)
+			evtdev->set_state_shutdown = do_set_shutdown;
+		evtdev->set_next_event = data->emutick;
+		evtdev->ipipe_stolen = 1;
+	}
+
+	hard_local_irq_restore(flags);
+
+	data->retval = get_dev_mode(evtdev);
+
+	desc = irq_to_desc(timer->irq);
+	if (desc && irqd_irq_disabled(&desc->irq_data))
+		ipipe_enable_irq(timer->irq);
+
+	if (evtdev->ipipe_stolen && clockevent_state_oneshot(evtdev)) {
+		ret = clockevents_program_event(evtdev,
+						evtdev->next_event, true);
+		if (ret)
+			data->retval = ret;
+	}
+}
+
+int ipipe_timer_start(void (*tick_handler)(void),
+		      void (*emumode)(enum clock_event_mode mode,
+				      struct clock_event_device *cdev),
+		      int (*emutick)(unsigned long evt,
+				     struct clock_event_device *cdev),
+		      unsigned int cpu)
+{
+	struct grab_timer_data data;
+	int ret;
+
+	data.tick_handler = tick_handler;
+	data.emutick = emutick;
+	data.emumode = emumode;
+	data.retval = -EINVAL;
+	ret = smp_call_function_single(cpu, grab_timer, &data, true);
+
+	return ret ?: data.retval;
+}
+
+static void release_timer(void *arg)
+{
+	struct clock_event_device *evtdev;
+	struct ipipe_timer *timer;
+	struct irq_desc *desc;
+	unsigned long flags;
+
+	flags = hard_local_irq_save();
+
+	timer = this_cpu_read(percpu_timer);
+
+	desc = irq_to_desc(timer->irq);
+	if (desc && irqd_irq_disabled(&desc->irq_data))
+		ipipe_disable_irq(timer->irq);
+
+	ipipe_free_irq(ipipe_head_domain, timer->irq);
+
+	evtdev = timer->host_timer;
+	if (evtdev && evtdev->ipipe_stolen) {
+		evtdev->mult = timer->real_mult;
+		evtdev->shift = timer->real_shift;
+		evtdev->set_state_periodic = timer->orig_set_state_periodic;
+		evtdev->set_state_oneshot = timer->orig_set_state_oneshot;
+		evtdev->set_state_oneshot_stopped = timer->orig_set_state_oneshot_stopped;
+		evtdev->set_state_shutdown = timer->orig_set_state_shutdown;
+		evtdev->set_next_event = timer->orig_set_next_event;
+		evtdev->ipipe_stolen = 0;
+		hard_local_irq_restore(flags);
+		if (clockevent_state_oneshot(evtdev))
+			clockevents_program_event(evtdev,
+						  evtdev->next_event, true);
+	} else
+		hard_local_irq_restore(flags);
+}
+
+void ipipe_timer_stop(unsigned int cpu)
+{
+	smp_call_function_single(cpu, release_timer, NULL, true);
+}
+
+void ipipe_timer_set(unsigned long cdelay)
+{
+	unsigned long tdelay;
+	struct ipipe_timer *t;
+
+	t = __ipipe_raw_cpu_read(percpu_timer);
+
+	/*
+	 * Even though some architectures may use a 64 bits delay
+	 * here, we voluntarily limit to 32 bits, 4 billions ticks
+	 * should be enough for now. Would a timer needs more, an
+	 * extra call to the tick handler would simply occur after 4
+	 * billions ticks.
+	 */
+	if (cdelay > UINT_MAX)
+		cdelay = UINT_MAX;
+
+	tdelay = cdelay;
+	if (t->c2t_integ != 1)
+		tdelay *= t->c2t_integ;
+	if (t->c2t_frac)
+		tdelay += ((unsigned long long)cdelay * t->c2t_frac) >> 32;
+	if (tdelay < t->min_delay_ticks)
+		tdelay = t->min_delay_ticks;
+	if (tdelay > t->max_delay_ticks)
+		tdelay = t->max_delay_ticks;
+
+	if (t->set(tdelay, t->timer_set) < 0)
+		ipipe_raise_irq(t->irq);
+}
+EXPORT_SYMBOL_GPL(ipipe_timer_set);
+
+const char *ipipe_timer_name(void)
+{
+	return per_cpu(percpu_timer, 0)->name;
+}
+EXPORT_SYMBOL_GPL(ipipe_timer_name);
+
+unsigned ipipe_timer_ns2ticks(struct ipipe_timer *timer, unsigned ns)
+{
+	unsigned long long tmp;
+	BUG_ON(!timer->freq);
+	tmp = (unsigned long long)ns * timer->freq;
+	do_div(tmp, 1000000000);
+	return tmp;
+}
+
+#ifdef CONFIG_IPIPE_HAVE_HOSTRT
+/* NOTE: The event receiver is responsible for providing proper locking. */
+void ipipe_update_hostrt(struct timekeeper *tk)
+{
+	struct tk_read_base *tkr = &tk->tkr_mono;
+	struct clocksource *clock = tkr->clock;
+	struct ipipe_hostrt_data data;
+	struct timespec xt;
+
+	if (clock != &__ipipe_hostrt_clock)
+		return;
+
+	xt.tv_sec = tk->xtime_sec;
+	xt.tv_nsec = (long)(tkr->xtime_nsec >> tkr->shift);
+	ipipe_root_only();
+	data.live = 1;
+	data.cycle_last = tkr->cycle_last;
+	data.mask = clock->mask;
+	data.mult = tkr->mult;
+	data.shift = tkr->shift;
+	data.wall_time_sec = xt.tv_sec;
+	data.wall_time_nsec = xt.tv_nsec;
+	data.wall_to_monotonic.tv_sec = tk->wall_to_monotonic.tv_sec;
+	data.wall_to_monotonic.tv_nsec = tk->wall_to_monotonic.tv_nsec;
+	__ipipe_notify_kevent(IPIPE_KEVT_HOSTRT, &data);
+}
+
+#endif /* CONFIG_IPIPE_HAVE_HOSTRT */
+
+int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
+			      bool force);
+
+void __ipipe_timer_refresh_freq(unsigned int hrclock_freq)
+{
+	struct ipipe_timer *t = __ipipe_raw_cpu_read(percpu_timer);
+	unsigned long flags;
+
+	if (t && t->refresh_freq) {
+		t->freq = t->refresh_freq();
+		flags = hard_local_irq_save();
+		config_pcpu_timer(t, hrclock_freq);
+		hard_local_irq_restore(flags);
+		clockevents_program_event(t->host_timer,
+					  t->host_timer->next_event, false);
+	}
+}
diff --git a/kernel/ipipe/tracer.c b/kernel/ipipe/tracer.c
new file mode 100644
index 000000000000..181d4df20a6a
--- /dev/null
+++ b/kernel/ipipe/tracer.c
@@ -0,0 +1,1524 @@
+/* -*- linux-c -*-
+ * kernel/ipipe/tracer.c
+ *
+ * Copyright (C) 2005 Luotao Fu.
+ *		 2005-2008 Jan Kiszka.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ * USA; either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/version.h>
+#include <linux/kallsyms.h>
+#include <linux/kdebug.h>
+#include <linux/seq_file.h>
+#include <linux/proc_fs.h>
+#include <linux/ctype.h>
+#include <linux/vmalloc.h>
+#include <linux/pid.h>
+#include <linux/vermagic.h>
+#include <linux/sched.h>
+#include <linux/ipipe.h>
+#include <linux/ftrace.h>
+#include <linux/uaccess.h>
+
+#define IPIPE_TRACE_PATHS	    4 /* <!> Do not lower below 3 */
+#define IPIPE_DEFAULT_ACTIVE	    0
+#define IPIPE_DEFAULT_MAX	    1
+#define IPIPE_DEFAULT_FROZEN	    2
+
+#define IPIPE_TRACE_POINTS	    (1 << CONFIG_IPIPE_TRACE_SHIFT)
+#define WRAP_POINT_NO(point)	    ((point) & (IPIPE_TRACE_POINTS-1))
+
+#define IPIPE_DEFAULT_PRE_TRACE	    10
+#define IPIPE_DEFAULT_POST_TRACE    10
+#define IPIPE_DEFAULT_BACK_TRACE    100
+
+#define IPIPE_DELAY_NOTE	    1000  /* in nanoseconds */
+#define IPIPE_DELAY_WARN	    10000 /* in nanoseconds */
+
+#define IPIPE_TFLG_NMI_LOCK	    0x0001
+#define IPIPE_TFLG_NMI_HIT	    0x0002
+#define IPIPE_TFLG_NMI_FREEZE_REQ   0x0004
+
+#define IPIPE_TFLG_HWIRQ_OFF	    0x0100
+#define IPIPE_TFLG_FREEZING	    0x0200
+#define IPIPE_TFLG_CURRDOM_SHIFT    10	 /* bits 10..11: current domain */
+#define IPIPE_TFLG_CURRDOM_MASK	    0x0C00
+#define IPIPE_TFLG_DOMSTATE_SHIFT   12	 /* bits 12..15: domain stalled? */
+#define IPIPE_TFLG_DOMSTATE_BITS    1
+
+#define IPIPE_TFLG_DOMAIN_STALLED(point, n) \
+	(point->flags & (1 << (n + IPIPE_TFLG_DOMSTATE_SHIFT)))
+#define IPIPE_TFLG_CURRENT_DOMAIN(point) \
+	((point->flags & IPIPE_TFLG_CURRDOM_MASK) >> IPIPE_TFLG_CURRDOM_SHIFT)
+
+struct ipipe_trace_point {
+	short type;
+	short flags;
+	unsigned long eip;
+	unsigned long parent_eip;
+	unsigned long v;
+	unsigned long long timestamp;
+};
+
+struct ipipe_trace_path {
+	volatile int flags;
+	int dump_lock; /* separated from flags due to cross-cpu access */
+	int trace_pos; /* next point to fill */
+	int begin, end; /* finalised path begin and end */
+	int post_trace; /* non-zero when in post-trace phase */
+	unsigned long long length; /* max path length in cycles */
+	unsigned long nmi_saved_eip; /* for deferred requests from NMIs */
+	unsigned long nmi_saved_parent_eip;
+	unsigned long nmi_saved_v;
+	struct ipipe_trace_point point[IPIPE_TRACE_POINTS];
+} ____cacheline_aligned_in_smp;
+
+enum ipipe_trace_type
+{
+	IPIPE_TRACE_FUNC = 0,
+	IPIPE_TRACE_BEGIN,
+	IPIPE_TRACE_END,
+	IPIPE_TRACE_FREEZE,
+	IPIPE_TRACE_SPECIAL,
+	IPIPE_TRACE_PID,
+	IPIPE_TRACE_EVENT,
+};
+
+#define IPIPE_TYPE_MASK		    0x0007
+#define IPIPE_TYPE_BITS		    3
+
+#ifdef CONFIG_IPIPE_TRACE_VMALLOC
+static DEFINE_PER_CPU(struct ipipe_trace_path *, trace_path);
+#else /* !CONFIG_IPIPE_TRACE_VMALLOC */
+static DEFINE_PER_CPU(struct ipipe_trace_path, trace_path[IPIPE_TRACE_PATHS]) =
+	{ [0 ... IPIPE_TRACE_PATHS-1] = { .begin = -1, .end = -1 } };
+#endif /* CONFIG_IPIPE_TRACE_VMALLOC */
+
+int ipipe_trace_enable = 0;
+
+static DEFINE_PER_CPU(int, active_path) = { IPIPE_DEFAULT_ACTIVE };
+static DEFINE_PER_CPU(int, max_path) = { IPIPE_DEFAULT_MAX };
+static DEFINE_PER_CPU(int, frozen_path) = { IPIPE_DEFAULT_FROZEN };
+static IPIPE_DEFINE_SPINLOCK(global_path_lock);
+static int pre_trace = IPIPE_DEFAULT_PRE_TRACE;
+static int post_trace = IPIPE_DEFAULT_POST_TRACE;
+static int back_trace = IPIPE_DEFAULT_BACK_TRACE;
+static int verbose_trace = 1;
+static unsigned long trace_overhead;
+
+static unsigned long trigger_begin;
+static unsigned long trigger_end;
+
+static DEFINE_MUTEX(out_mutex);
+static struct ipipe_trace_path *print_path;
+#ifdef CONFIG_IPIPE_TRACE_PANIC
+static struct ipipe_trace_path *panic_path;
+#endif /* CONFIG_IPIPE_TRACE_PANIC */
+static int print_pre_trace;
+static int print_post_trace;
+
+
+static long __ipipe_signed_tsc2us(long long tsc);
+static void
+__ipipe_trace_point_type(char *buf, struct ipipe_trace_point *point);
+static void __ipipe_print_symname(struct seq_file *m, unsigned long eip);
+
+static inline void store_states(struct ipipe_domain *ipd,
+				struct ipipe_trace_point *point, int pos)
+{
+	if (test_bit(IPIPE_STALL_FLAG, &ipipe_this_cpu_context(ipd)->status))
+		point->flags |= 1 << (pos + IPIPE_TFLG_DOMSTATE_SHIFT);
+
+	if (ipd == __ipipe_current_domain)
+		point->flags |= pos << IPIPE_TFLG_CURRDOM_SHIFT;
+}
+
+static notrace void
+__ipipe_store_domain_states(struct ipipe_trace_point *point)
+{
+	store_states(ipipe_root_domain, point, 0);
+	if (ipipe_head_domain != ipipe_root_domain)
+		store_states(ipipe_head_domain, point, 1);
+}
+
+static notrace int __ipipe_get_free_trace_path(int old, int cpu)
+{
+	int new_active = old;
+	struct ipipe_trace_path *tp;
+
+	do {
+		if (++new_active == IPIPE_TRACE_PATHS)
+			new_active = 0;
+		tp = &per_cpu(trace_path, cpu)[new_active];
+	} while (new_active == per_cpu(max_path, cpu) ||
+		 new_active == per_cpu(frozen_path, cpu) ||
+		 tp->dump_lock);
+
+	return new_active;
+}
+
+static notrace void
+__ipipe_migrate_pre_trace(struct ipipe_trace_path *new_tp,
+			  struct ipipe_trace_path *old_tp, int old_pos)
+{
+	int i;
+
+	new_tp->trace_pos = pre_trace+1;
+
+	for (i = new_tp->trace_pos; i > 0; i--)
+		memcpy(&new_tp->point[WRAP_POINT_NO(new_tp->trace_pos-i)],
+		       &old_tp->point[WRAP_POINT_NO(old_pos-i)],
+		       sizeof(struct ipipe_trace_point));
+
+	/* mark the end (i.e. the point before point[0]) invalid */
+	new_tp->point[IPIPE_TRACE_POINTS-1].eip = 0;
+}
+
+static notrace struct ipipe_trace_path *
+__ipipe_trace_end(int cpu, struct ipipe_trace_path *tp, int pos)
+{
+	struct ipipe_trace_path *old_tp = tp;
+	long active = per_cpu(active_path, cpu);
+	unsigned long long length;
+
+	/* do we have a new worst case? */
+	length = tp->point[tp->end].timestamp -
+		 tp->point[tp->begin].timestamp;
+	if (length > per_cpu(trace_path, cpu)[per_cpu(max_path, cpu)].length) {
+		/* we need protection here against other cpus trying
+		   to start a proc dump */
+		raw_spin_lock(&global_path_lock);
+
+		/* active path holds new worst case */
+		tp->length = length;
+		per_cpu(max_path, cpu) = active;
+
+		/* find next unused trace path */
+		active = __ipipe_get_free_trace_path(active, cpu);
+
+		raw_spin_unlock(&global_path_lock);
+
+		tp = &per_cpu(trace_path, cpu)[active];
+
+		/* migrate last entries for pre-tracing */
+		__ipipe_migrate_pre_trace(tp, old_tp, pos);
+	}
+
+	return tp;
+}
+
+static notrace struct ipipe_trace_path *
+__ipipe_trace_freeze(int cpu, struct ipipe_trace_path *tp, int pos)
+{
+	struct ipipe_trace_path *old_tp = tp;
+	long active = per_cpu(active_path, cpu);
+	int n;
+
+	/* frozen paths have no core (begin=end) */
+	tp->begin = tp->end;
+
+	/* we need protection here against other cpus trying
+	 * to set their frozen path or to start a proc dump */
+	raw_spin_lock(&global_path_lock);
+
+	per_cpu(frozen_path, cpu) = active;
+
+	/* find next unused trace path */
+	active = __ipipe_get_free_trace_path(active, cpu);
+
+	/* check if this is the first frozen path */
+	for_each_possible_cpu(n) {
+		if (n != cpu &&
+		    per_cpu(trace_path, n)[per_cpu(frozen_path, n)].end >= 0)
+			tp->end = -1;
+	}
+
+	raw_spin_unlock(&global_path_lock);
+
+	tp = &per_cpu(trace_path, cpu)[active];
+
+	/* migrate last entries for pre-tracing */
+	__ipipe_migrate_pre_trace(tp, old_tp, pos);
+
+	return tp;
+}
+
+void notrace
+__ipipe_trace(enum ipipe_trace_type type, unsigned long eip,
+	      unsigned long parent_eip, unsigned long v)
+{
+	struct ipipe_trace_path *tp, *old_tp;
+	int pos, next_pos, begin;
+	struct ipipe_trace_point *point;
+	unsigned long flags;
+	int cpu;
+
+	flags = hard_local_irq_save_notrace();
+
+	cpu = ipipe_processor_id();
+ restart:
+	tp = old_tp = &per_cpu(trace_path, cpu)[per_cpu(active_path, cpu)];
+
+	/* here starts a race window with NMIs - catched below */
+
+	/* check for NMI recursion */
+	if (unlikely(tp->flags & IPIPE_TFLG_NMI_LOCK)) {
+		tp->flags |= IPIPE_TFLG_NMI_HIT;
+
+		/* first freeze request from NMI context? */
+		if ((type == IPIPE_TRACE_FREEZE) &&
+		    !(tp->flags & IPIPE_TFLG_NMI_FREEZE_REQ)) {
+			/* save arguments and mark deferred freezing */
+			tp->flags |= IPIPE_TFLG_NMI_FREEZE_REQ;
+			tp->nmi_saved_eip = eip;
+			tp->nmi_saved_parent_eip = parent_eip;
+			tp->nmi_saved_v = v;
+		}
+		return; /* no need for restoring flags inside IRQ */
+	}
+
+	/* clear NMI events and set lock (atomically per cpu) */
+	tp->flags = (tp->flags & ~(IPIPE_TFLG_NMI_HIT |
+				   IPIPE_TFLG_NMI_FREEZE_REQ))
+			       | IPIPE_TFLG_NMI_LOCK;
+
+	/* check active_path again - some nasty NMI may have switched
+	 * it meanwhile */
+	if (unlikely(tp !=
+		     &per_cpu(trace_path, cpu)[per_cpu(active_path, cpu)])) {
+		/* release lock on wrong path and restart */
+		tp->flags &= ~IPIPE_TFLG_NMI_LOCK;
+
+		/* there is no chance that the NMI got deferred
+		 * => no need to check for pending freeze requests */
+		goto restart;
+	}
+
+	/* get the point buffer */
+	pos = tp->trace_pos;
+	point = &tp->point[pos];
+
+	/* store all trace point data */
+	point->type = type;
+	point->flags = hard_irqs_disabled_flags(flags) ? IPIPE_TFLG_HWIRQ_OFF : 0;
+	point->eip = eip;
+	point->parent_eip = parent_eip;
+	point->v = v;
+	ipipe_read_tsc(point->timestamp);
+
+	__ipipe_store_domain_states(point);
+
+	/* forward to next point buffer */
+	next_pos = WRAP_POINT_NO(pos+1);
+	tp->trace_pos = next_pos;
+
+	/* only mark beginning if we haven't started yet */
+	begin = tp->begin;
+	if (unlikely(type == IPIPE_TRACE_BEGIN) && (begin < 0))
+		tp->begin = pos;
+
+	/* end of critical path, start post-trace if not already started */
+	if (unlikely(type == IPIPE_TRACE_END) &&
+	    (begin >= 0) && !tp->post_trace)
+		tp->post_trace = post_trace + 1;
+
+	/* freeze only if the slot is free and we are not already freezing */
+	if ((unlikely(type == IPIPE_TRACE_FREEZE) ||
+	     (unlikely(eip >= trigger_begin && eip <= trigger_end) &&
+	     type == IPIPE_TRACE_FUNC)) &&
+	    per_cpu(trace_path, cpu)[per_cpu(frozen_path, cpu)].begin < 0 &&
+	    !(tp->flags & IPIPE_TFLG_FREEZING)) {
+		tp->post_trace = post_trace + 1;
+		tp->flags |= IPIPE_TFLG_FREEZING;
+	}
+
+	/* enforce end of trace in case of overflow */
+	if (unlikely(WRAP_POINT_NO(next_pos + 1) == begin)) {
+		tp->end = pos;
+		goto enforce_end;
+	}
+
+	/* stop tracing this path if we are in post-trace and
+	 *  a) that phase is over now or
+	 *  b) a new TRACE_BEGIN came in but we are not freezing this path */
+	if (unlikely((tp->post_trace > 0) && ((--tp->post_trace == 0) ||
+		     ((type == IPIPE_TRACE_BEGIN) &&
+		      !(tp->flags & IPIPE_TFLG_FREEZING))))) {
+		/* store the path's end (i.e. excluding post-trace) */
+		tp->end = WRAP_POINT_NO(pos - post_trace + tp->post_trace);
+
+ enforce_end:
+		if (tp->flags & IPIPE_TFLG_FREEZING)
+			tp = __ipipe_trace_freeze(cpu, tp, pos);
+		else
+			tp = __ipipe_trace_end(cpu, tp, pos);
+
+		/* reset the active path, maybe already start a new one */
+		tp->begin = (type == IPIPE_TRACE_BEGIN) ?
+			WRAP_POINT_NO(tp->trace_pos - 1) : -1;
+		tp->end = -1;
+		tp->post_trace = 0;
+		tp->flags = 0;
+
+		/* update active_path not earlier to avoid races with NMIs */
+		per_cpu(active_path, cpu) = tp - per_cpu(trace_path, cpu);
+	}
+
+	/* we still have old_tp and point,
+	 * let's reset NMI lock and check for catches */
+	old_tp->flags &= ~IPIPE_TFLG_NMI_LOCK;
+	if (unlikely(old_tp->flags & IPIPE_TFLG_NMI_HIT)) {
+		/* well, this late tagging may not immediately be visible for
+		 * other cpus already dumping this path - a minor issue */
+		point->flags |= IPIPE_TFLG_NMI_HIT;
+
+		/* handle deferred freezing from NMI context */
+		if (old_tp->flags & IPIPE_TFLG_NMI_FREEZE_REQ)
+			__ipipe_trace(IPIPE_TRACE_FREEZE, old_tp->nmi_saved_eip,
+				      old_tp->nmi_saved_parent_eip,
+				      old_tp->nmi_saved_v);
+	}
+
+	hard_local_irq_restore_notrace(flags);
+}
+
+static unsigned long __ipipe_global_path_lock(void)
+{
+	unsigned long flags;
+	int cpu;
+	struct ipipe_trace_path *tp;
+
+	raw_spin_lock_irqsave(&global_path_lock, flags);
+
+	cpu = ipipe_processor_id();
+ restart:
+	tp = &per_cpu(trace_path, cpu)[per_cpu(active_path, cpu)];
+
+	/* here is small race window with NMIs - catched below */
+
+	/* clear NMI events and set lock (atomically per cpu) */
+	tp->flags = (tp->flags & ~(IPIPE_TFLG_NMI_HIT |
+				   IPIPE_TFLG_NMI_FREEZE_REQ))
+			       | IPIPE_TFLG_NMI_LOCK;
+
+	/* check active_path again - some nasty NMI may have switched
+	 * it meanwhile */
+	if (tp != &per_cpu(trace_path, cpu)[per_cpu(active_path, cpu)]) {
+		/* release lock on wrong path and restart */
+		tp->flags &= ~IPIPE_TFLG_NMI_LOCK;
+
+		/* there is no chance that the NMI got deferred
+		 * => no need to check for pending freeze requests */
+		goto restart;
+	}
+
+	return flags;
+}
+
+static void __ipipe_global_path_unlock(unsigned long flags)
+{
+	int cpu;
+	struct ipipe_trace_path *tp;
+
+	/* release spinlock first - it's not involved in the NMI issue */
+	__ipipe_spin_unlock_irqbegin(&global_path_lock);
+
+	cpu = ipipe_processor_id();
+	tp = &per_cpu(trace_path, cpu)[per_cpu(active_path, cpu)];
+
+	tp->flags &= ~IPIPE_TFLG_NMI_LOCK;
+
+	/* handle deferred freezing from NMI context */
+	if (tp->flags & IPIPE_TFLG_NMI_FREEZE_REQ)
+		__ipipe_trace(IPIPE_TRACE_FREEZE, tp->nmi_saved_eip,
+			      tp->nmi_saved_parent_eip, tp->nmi_saved_v);
+
+	/* See __ipipe_spin_lock_irqsave() and friends. */
+	__ipipe_spin_unlock_irqcomplete(flags);
+}
+
+void notrace asmlinkage
+ipipe_trace_asm(enum ipipe_trace_type type, unsigned long eip,
+		unsigned long parent_eip, unsigned long v)
+{
+	if (!ipipe_trace_enable)
+		return;
+	__ipipe_trace(type, eip, parent_eip, v);
+}
+
+void notrace ipipe_trace_begin(unsigned long v)
+{
+	if (!ipipe_trace_enable)
+		return;
+	__ipipe_trace(IPIPE_TRACE_BEGIN, CALLER_ADDR0,
+		      CALLER_ADDR1, v);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_begin);
+
+void notrace ipipe_trace_end(unsigned long v)
+{
+	if (!ipipe_trace_enable)
+		return;
+	__ipipe_trace(IPIPE_TRACE_END, CALLER_ADDR0,
+		      CALLER_ADDR1, v);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_end);
+
+void notrace ipipe_trace_irqbegin(int irq, struct pt_regs *regs)
+{
+	if (!ipipe_trace_enable)
+		return;
+	__ipipe_trace(IPIPE_TRACE_BEGIN, instruction_pointer(regs),
+		      CALLER_ADDR1, irq);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_irqbegin);
+
+void notrace ipipe_trace_irqend(int irq, struct pt_regs *regs)
+{
+	if (!ipipe_trace_enable)
+		return;
+	__ipipe_trace(IPIPE_TRACE_END, instruction_pointer(regs),
+		      CALLER_ADDR1, irq);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_irqend);
+
+void notrace ipipe_trace_freeze(unsigned long v)
+{
+	if (!ipipe_trace_enable)
+		return;
+	__ipipe_trace(IPIPE_TRACE_FREEZE, CALLER_ADDR0,
+		      CALLER_ADDR1, v);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_freeze);
+
+void notrace ipipe_trace_special(unsigned char id, unsigned long v)
+{
+	if (!ipipe_trace_enable)
+		return;
+	__ipipe_trace(IPIPE_TRACE_SPECIAL | (id << IPIPE_TYPE_BITS),
+		      CALLER_ADDR0,
+		      CALLER_ADDR1, v);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_special);
+
+void notrace ipipe_trace_pid(pid_t pid, short prio)
+{
+	if (!ipipe_trace_enable)
+		return;
+	__ipipe_trace(IPIPE_TRACE_PID | (prio << IPIPE_TYPE_BITS),
+		      CALLER_ADDR0,
+		      CALLER_ADDR1, pid);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_pid);
+
+void notrace ipipe_trace_event(unsigned char id, unsigned long delay_tsc)
+{
+	if (!ipipe_trace_enable)
+		return;
+	__ipipe_trace(IPIPE_TRACE_EVENT | (id << IPIPE_TYPE_BITS),
+		      CALLER_ADDR0,
+		      CALLER_ADDR1, delay_tsc);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_event);
+
+int ipipe_trace_max_reset(void)
+{
+	int cpu;
+	unsigned long flags;
+	struct ipipe_trace_path *path;
+	int ret = 0;
+
+	flags = __ipipe_global_path_lock();
+
+	for_each_possible_cpu(cpu) {
+		path = &per_cpu(trace_path, cpu)[per_cpu(max_path, cpu)];
+
+		if (path->dump_lock) {
+			ret = -EBUSY;
+			break;
+		}
+
+		path->begin	= -1;
+		path->end	= -1;
+		path->trace_pos = 0;
+		path->length	= 0;
+	}
+
+	__ipipe_global_path_unlock(flags);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_max_reset);
+
+int ipipe_trace_frozen_reset(void)
+{
+	int cpu;
+	unsigned long flags;
+	struct ipipe_trace_path *path;
+	int ret = 0;
+
+	flags = __ipipe_global_path_lock();
+
+	for_each_online_cpu(cpu) {
+		path = &per_cpu(trace_path, cpu)[per_cpu(frozen_path, cpu)];
+
+		if (path->dump_lock) {
+			ret = -EBUSY;
+			break;
+		}
+
+		path->begin = -1;
+		path->end = -1;
+		path->trace_pos = 0;
+		path->length	= 0;
+	}
+
+	__ipipe_global_path_unlock(flags);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_frozen_reset);
+
+static void
+__ipipe_get_task_info(char *task_info, struct ipipe_trace_point *point,
+		      int trylock)
+{
+	struct task_struct *task = NULL;
+	char buf[8];
+	int i;
+	int locked = 1;
+
+	if (trylock) {
+		if (!read_trylock(&tasklist_lock))
+			locked = 0;
+	} else
+		read_lock(&tasklist_lock);
+
+	if (locked)
+		task = find_task_by_pid_ns((pid_t)point->v, &init_pid_ns);
+
+	if (task)
+		strncpy(task_info, task->comm, 11);
+	else
+		strcpy(task_info, "-<?>-");
+
+	if (locked)
+		read_unlock(&tasklist_lock);
+
+	for (i = strlen(task_info); i < 11; i++)
+		task_info[i] = ' ';
+
+	sprintf(buf, " %d ", point->type >> IPIPE_TYPE_BITS);
+	strcpy(task_info + (11 - strlen(buf)), buf);
+}
+
+static void
+__ipipe_get_event_date(char *buf,struct ipipe_trace_path *path,
+		       struct ipipe_trace_point *point)
+{
+	long time;
+	int type;
+
+	time = __ipipe_signed_tsc2us(point->timestamp -
+				     path->point[path->begin].timestamp + point->v);
+	type = point->type >> IPIPE_TYPE_BITS;
+
+	if (type == 0)
+		/*
+		 * Event type #0 is predefined, stands for the next
+		 * timer tick.
+		 */
+		sprintf(buf, "tick@%-6ld", time);
+	else
+		sprintf(buf, "%3d@%-7ld", type, time);
+}
+
+#ifdef CONFIG_IPIPE_TRACE_PANIC
+
+void ipipe_trace_panic_freeze(void)
+{
+	unsigned long flags;
+	int cpu;
+
+	if (!ipipe_trace_enable)
+		return;
+
+	ipipe_trace_enable = 0;
+	flags = hard_local_irq_save_notrace();
+
+	cpu = ipipe_processor_id();
+
+	panic_path = &per_cpu(trace_path, cpu)[per_cpu(active_path, cpu)];
+
+	hard_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_panic_freeze);
+
+void ipipe_trace_panic_dump(void)
+{
+	int cnt = back_trace;
+	int start, pos;
+	char buf[16];
+
+	if (!panic_path)
+		return;
+
+	ipipe_context_check_off();
+
+	printk(KERN_EMERG "I-pipe tracer log (%d points):\n", cnt);
+
+	start = pos = WRAP_POINT_NO(panic_path->trace_pos-1);
+
+	while (cnt-- > 0) {
+		struct ipipe_trace_point *point = &panic_path->point[pos];
+		long time;
+		char info[16];
+		int i;
+
+		printk(KERN_EMERG " %c",
+		       (point->flags & IPIPE_TFLG_HWIRQ_OFF) ? '|' : ' ');
+
+		for (i = IPIPE_TFLG_DOMSTATE_BITS; i >= 0; i--)
+			printk(KERN_CONT "%c",
+			       (IPIPE_TFLG_CURRENT_DOMAIN(point) == i) ?
+				(IPIPE_TFLG_DOMAIN_STALLED(point, i) ?
+					'#' : '+') :
+				(IPIPE_TFLG_DOMAIN_STALLED(point, i) ?
+					'*' : ' '));
+
+		if (!point->eip)
+			printk(KERN_CONT "-<invalid>-\n");
+		else {
+			__ipipe_trace_point_type(buf, point);
+			printk(KERN_CONT "%s", buf);
+
+			switch (point->type & IPIPE_TYPE_MASK) {
+				case IPIPE_TRACE_FUNC:
+					printk(KERN_CONT "           ");
+					break;
+
+				case IPIPE_TRACE_PID:
+					__ipipe_get_task_info(info,
+							      point, 1);
+					printk(KERN_CONT "%s", info);
+					break;
+
+				case IPIPE_TRACE_EVENT:
+					__ipipe_get_event_date(info,
+							       panic_path,
+							       point);
+					printk(KERN_CONT "%s", info);
+					break;
+
+				default:
+					printk(KERN_CONT "0x%08lx ", point->v);
+			}
+
+			time = __ipipe_signed_tsc2us(point->timestamp -
+				panic_path->point[start].timestamp);
+			printk(KERN_CONT " %5ld ", time);
+
+			__ipipe_print_symname(NULL, point->eip);
+			printk(KERN_CONT " (");
+			__ipipe_print_symname(NULL, point->parent_eip);
+			printk(KERN_CONT ")\n");
+		}
+		pos = WRAP_POINT_NO(pos - 1);
+	}
+
+	panic_path = NULL;
+}
+EXPORT_SYMBOL_GPL(ipipe_trace_panic_dump);
+
+static int ipipe_trace_panic_handler(struct notifier_block *this,
+				     unsigned long event, void *unused)
+{
+	ipipe_trace_panic_dump();
+	return NOTIFY_OK;
+}
+
+static struct notifier_block ipipe_trace_panic_notifier = {
+	.notifier_call  = ipipe_trace_panic_handler,
+	.priority       = 150,
+};
+
+static int ipipe_trace_die_handler(struct notifier_block *self,
+				   unsigned long val, void *data)
+{
+	switch (val) {
+	case DIE_OOPS:
+		ipipe_trace_panic_dump();
+		break;
+	default:
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block ipipe_trace_die_notifier = {
+	.notifier_call = ipipe_trace_die_handler,
+	.priority = 200,
+};
+
+#endif /* CONFIG_IPIPE_TRACE_PANIC */
+
+
+/* --- /proc output --- */
+
+static notrace int __ipipe_in_critical_trpath(long point_no)
+{
+	return ((WRAP_POINT_NO(point_no-print_path->begin) <
+		 WRAP_POINT_NO(print_path->end-print_path->begin)) ||
+		((print_path->end == print_path->begin) &&
+		 (WRAP_POINT_NO(point_no-print_path->end) >
+		  print_post_trace)));
+}
+
+static long __ipipe_signed_tsc2us(long long tsc)
+{
+	unsigned long long abs_tsc;
+	long us;
+
+	if (!__ipipe_hrclock_ok())
+		return 0;
+
+	/* ipipe_tsc2us works on unsigned => handle sign separately */
+	abs_tsc = (tsc >= 0) ? tsc : -tsc;
+	us = ipipe_tsc2us(abs_tsc);
+	if (tsc < 0)
+		return -us;
+	else
+		return us;
+}
+
+static void
+__ipipe_trace_point_type(char *buf, struct ipipe_trace_point *point)
+{
+	switch (point->type & IPIPE_TYPE_MASK) {
+		case IPIPE_TRACE_FUNC:
+			strcpy(buf, "func    ");
+			break;
+
+		case IPIPE_TRACE_BEGIN:
+			strcpy(buf, "begin   ");
+			break;
+
+		case IPIPE_TRACE_END:
+			strcpy(buf, "end     ");
+			break;
+
+		case IPIPE_TRACE_FREEZE:
+			strcpy(buf, "freeze  ");
+			break;
+
+		case IPIPE_TRACE_SPECIAL:
+			sprintf(buf, "(0x%02x)  ",
+				point->type >> IPIPE_TYPE_BITS);
+			break;
+
+		case IPIPE_TRACE_PID:
+			sprintf(buf, "[%5d] ", (pid_t)point->v);
+			break;
+
+		case IPIPE_TRACE_EVENT:
+			sprintf(buf, "event   ");
+			break;
+	}
+}
+
+static void
+__ipipe_print_pathmark(struct seq_file *m, struct ipipe_trace_point *point)
+{
+	char mark = ' ';
+	int point_no = point - print_path->point;
+	int i;
+
+	if (print_path->end == point_no)
+		mark = '<';
+	else if (print_path->begin == point_no)
+		mark = '>';
+	else if (__ipipe_in_critical_trpath(point_no))
+		mark = ':';
+	seq_printf(m, "%c%c", mark,
+		   (point->flags & IPIPE_TFLG_HWIRQ_OFF) ? '|' : ' ');
+
+	if (!verbose_trace)
+		return;
+
+	for (i = IPIPE_TFLG_DOMSTATE_BITS; i >= 0; i--)
+		seq_printf(m, "%c",
+			(IPIPE_TFLG_CURRENT_DOMAIN(point) == i) ?
+			    (IPIPE_TFLG_DOMAIN_STALLED(point, i) ?
+				'#' : '+') :
+			(IPIPE_TFLG_DOMAIN_STALLED(point, i) ? '*' : ' '));
+}
+
+static void
+__ipipe_print_delay(struct seq_file *m, struct ipipe_trace_point *point)
+{
+	unsigned long delay = 0;
+	int next;
+	char *mark = "  ";
+
+	next = WRAP_POINT_NO(point+1 - print_path->point);
+
+	if (next != print_path->trace_pos)
+		delay = ipipe_tsc2ns(print_path->point[next].timestamp -
+				     point->timestamp);
+
+	if (__ipipe_in_critical_trpath(point - print_path->point)) {
+		if (delay > IPIPE_DELAY_WARN)
+			mark = "! ";
+		else if (delay > IPIPE_DELAY_NOTE)
+			mark = "+ ";
+	}
+	seq_puts(m, mark);
+
+	if (verbose_trace)
+		seq_printf(m, "%3lu.%03lu%c ", delay/1000, delay%1000,
+			   (point->flags & IPIPE_TFLG_NMI_HIT) ? 'N' : ' ');
+	else
+		seq_puts(m, " ");
+}
+
+static void __ipipe_print_symname(struct seq_file *m, unsigned long eip)
+{
+	char namebuf[KSYM_NAME_LEN+1];
+	unsigned long size, offset;
+	const char *sym_name;
+	char *modname;
+
+	sym_name = kallsyms_lookup(eip, &size, &offset, &modname, namebuf);
+
+#ifdef CONFIG_IPIPE_TRACE_PANIC
+	if (!m) {
+		/* panic dump */
+		if (sym_name) {
+			printk(KERN_CONT "%s+0x%lx", sym_name, offset);
+			if (modname)
+				printk(KERN_CONT " [%s]", modname);
+		} else
+			printk(KERN_CONT "<%08lx>", eip);
+	} else
+#endif /* CONFIG_IPIPE_TRACE_PANIC */
+	{
+		if (sym_name) {
+			if (verbose_trace) {
+				seq_printf(m, "%s+0x%lx", sym_name, offset);
+				if (modname)
+					seq_printf(m, " [%s]", modname);
+			} else
+				seq_puts(m, sym_name);
+		} else
+			seq_printf(m, "<%08lx>", eip);
+	}
+}
+
+static void __ipipe_print_headline(struct seq_file *m)
+{
+	const char *name[2];
+
+	seq_printf(m, "Calibrated minimum trace-point overhead: %lu.%03lu "
+		   "us\n\n", trace_overhead/1000, trace_overhead%1000);
+
+	if (verbose_trace) {
+		name[0] = ipipe_root_domain->name;
+		if (ipipe_head_domain != ipipe_root_domain)
+			name[1] = ipipe_head_domain->name;
+		else
+			name[1] = "<unused>";
+
+		seq_printf(m,
+			   " +----- Hard IRQs ('|': locked)\n"
+			   " |+-- %s\n"
+			   " ||+- %s%s\n"
+		           " |||                        +---------- "
+			       "Delay flag ('+': > %d us, '!': > %d us)\n"
+			   " |||                        |        +- "
+			       "NMI noise ('N')\n"
+			   " |||                        |        |\n"
+			   "    Type    User Val.   Time    Delay  Function "
+			       "(Parent)\n",
+			   name[1], name[0],
+			   " ('*': domain stalled, '+': current, "
+			   "'#': current+stalled)",
+			   IPIPE_DELAY_NOTE/1000, IPIPE_DELAY_WARN/1000);
+	} else
+		seq_printf(m,
+			   " +--------------- Hard IRQs ('|': locked)\n"
+			   " |             +- Delay flag "
+			       "('+': > %d us, '!': > %d us)\n"
+			   " |             |\n"
+			   "  Type     Time   Function (Parent)\n",
+			   IPIPE_DELAY_NOTE/1000, IPIPE_DELAY_WARN/1000);
+}
+
+static void *__ipipe_max_prtrace_start(struct seq_file *m, loff_t *pos)
+{
+	loff_t n = *pos;
+
+	mutex_lock(&out_mutex);
+
+	if (!n) {
+		struct ipipe_trace_path *tp;
+		unsigned long length_usecs;
+		int points, cpu;
+		unsigned long flags;
+
+		/* protect against max_path/frozen_path updates while we
+		 * haven't locked our target path, also avoid recursively
+		 * taking global_path_lock from NMI context */
+		flags = __ipipe_global_path_lock();
+
+		/* find the longest of all per-cpu paths */
+		print_path = NULL;
+		for_each_online_cpu(cpu) {
+			tp = &per_cpu(trace_path, cpu)[per_cpu(max_path, cpu)];
+			if ((print_path == NULL) ||
+			    (tp->length > print_path->length)) {
+				print_path = tp;
+				break;
+			}
+		}
+		print_path->dump_lock = 1;
+
+		__ipipe_global_path_unlock(flags);
+
+		if (!__ipipe_hrclock_ok()) {
+			seq_printf(m, "No hrclock available, dumping traces disabled\n");
+			return NULL;
+		}
+
+		/* does this path actually contain data? */
+		if (print_path->end == print_path->begin)
+			return NULL;
+
+		/* number of points inside the critical path */
+		points = WRAP_POINT_NO(print_path->end-print_path->begin+1);
+
+		/* pre- and post-tracing length, post-trace length was frozen
+		   in __ipipe_trace, pre-trace may have to be reduced due to
+		   buffer overrun */
+		print_pre_trace	 = pre_trace;
+		print_post_trace = WRAP_POINT_NO(print_path->trace_pos -
+						 print_path->end - 1);
+		if (points+pre_trace+print_post_trace > IPIPE_TRACE_POINTS - 1)
+			print_pre_trace = IPIPE_TRACE_POINTS - 1 - points -
+				print_post_trace;
+
+		length_usecs = ipipe_tsc2us(print_path->length);
+		seq_printf(m, "I-pipe worst-case tracing service on %s/ipipe release #%d\n"
+			   "-------------------------------------------------------------\n",
+			UTS_RELEASE, IPIPE_CORE_RELEASE);
+		seq_printf(m, "CPU: %d, Begin: %lld cycles, Trace Points: "
+			"%d (-%d/+%d), Length: %lu us\n",
+			cpu, print_path->point[print_path->begin].timestamp,
+			points, print_pre_trace, print_post_trace, length_usecs);
+		__ipipe_print_headline(m);
+	}
+
+	/* check if we are inside the trace range */
+	if (n >= WRAP_POINT_NO(print_path->end - print_path->begin + 1 +
+			       print_pre_trace + print_post_trace))
+		return NULL;
+
+	/* return the next point to be shown */
+	return &print_path->point[WRAP_POINT_NO(print_path->begin -
+						print_pre_trace + n)];
+}
+
+static void *__ipipe_prtrace_next(struct seq_file *m, void *p, loff_t *pos)
+{
+	loff_t n = ++*pos;
+
+	/* check if we are inside the trace range with the next entry */
+	if (n >= WRAP_POINT_NO(print_path->end - print_path->begin + 1 +
+			       print_pre_trace + print_post_trace))
+		return NULL;
+
+	/* return the next point to be shown */
+	return &print_path->point[WRAP_POINT_NO(print_path->begin -
+						print_pre_trace + *pos)];
+}
+
+static void __ipipe_prtrace_stop(struct seq_file *m, void *p)
+{
+	if (print_path)
+		print_path->dump_lock = 0;
+	mutex_unlock(&out_mutex);
+}
+
+static int __ipipe_prtrace_show(struct seq_file *m, void *p)
+{
+	long time;
+	struct ipipe_trace_point *point = p;
+	char buf[16];
+
+	if (!point->eip) {
+		seq_puts(m, "-<invalid>-\n");
+		return 0;
+	}
+
+	__ipipe_print_pathmark(m, point);
+	__ipipe_trace_point_type(buf, point);
+	seq_puts(m, buf);
+	if (verbose_trace)
+		switch (point->type & IPIPE_TYPE_MASK) {
+			case IPIPE_TRACE_FUNC:
+				seq_puts(m, "           ");
+				break;
+
+			case IPIPE_TRACE_PID:
+				__ipipe_get_task_info(buf, point, 0);
+				seq_puts(m, buf);
+				break;
+
+			case IPIPE_TRACE_EVENT:
+				__ipipe_get_event_date(buf, print_path, point);
+				seq_puts(m, buf);
+				break;
+
+			default:
+				seq_printf(m, "0x%08lx ", point->v);
+		}
+
+	time = __ipipe_signed_tsc2us(point->timestamp -
+		print_path->point[print_path->begin].timestamp);
+	seq_printf(m, "%5ld", time);
+
+	__ipipe_print_delay(m, point);
+	__ipipe_print_symname(m, point->eip);
+	seq_puts(m, " (");
+	__ipipe_print_symname(m, point->parent_eip);
+	seq_puts(m, ")\n");
+
+	return 0;
+}
+
+static struct seq_operations __ipipe_max_ptrace_ops = {
+	.start = __ipipe_max_prtrace_start,
+	.next  = __ipipe_prtrace_next,
+	.stop  = __ipipe_prtrace_stop,
+	.show  = __ipipe_prtrace_show
+};
+
+static int __ipipe_max_prtrace_open(struct inode *inode, struct file *file)
+{
+	return seq_open(file, &__ipipe_max_ptrace_ops);
+}
+
+static ssize_t
+__ipipe_max_reset(struct file *file, const char __user *pbuffer,
+		  size_t count, loff_t *data)
+{
+	mutex_lock(&out_mutex);
+	ipipe_trace_max_reset();
+	mutex_unlock(&out_mutex);
+
+	return count;
+}
+
+static const struct file_operations __ipipe_max_prtrace_fops = {
+	.open	    = __ipipe_max_prtrace_open,
+	.read	    = seq_read,
+	.write	    = __ipipe_max_reset,
+	.llseek	    = seq_lseek,
+	.release    = seq_release,
+};
+
+static void *__ipipe_frozen_prtrace_start(struct seq_file *m, loff_t *pos)
+{
+	loff_t n = *pos;
+
+	mutex_lock(&out_mutex);
+
+	if (!n) {
+		struct ipipe_trace_path *tp;
+		int cpu;
+		unsigned long flags;
+
+		/* protect against max_path/frozen_path updates while we
+		 * haven't locked our target path, also avoid recursively
+		 * taking global_path_lock from NMI context */
+		flags = __ipipe_global_path_lock();
+
+		/* find the first of all per-cpu frozen paths */
+		print_path = NULL;
+		for_each_online_cpu(cpu) {
+			tp = &per_cpu(trace_path, cpu)[per_cpu(frozen_path, cpu)];
+			if (tp->end >= 0) {
+				print_path = tp;
+				break;
+			}
+		}
+		if (print_path)
+			print_path->dump_lock = 1;
+
+		__ipipe_global_path_unlock(flags);
+
+		if (!print_path)
+			return NULL;
+
+		if (!__ipipe_hrclock_ok()) {
+			seq_printf(m, "No hrclock available, dumping traces disabled\n");
+			return NULL;
+		}
+
+		/* back- and post-tracing length, post-trace length was frozen
+		   in __ipipe_trace, back-trace may have to be reduced due to
+		   buffer overrun */
+		print_pre_trace	 = back_trace-1; /* substract freeze point */
+		print_post_trace = WRAP_POINT_NO(print_path->trace_pos -
+						 print_path->end - 1);
+		if (1+pre_trace+print_post_trace > IPIPE_TRACE_POINTS - 1)
+			print_pre_trace = IPIPE_TRACE_POINTS - 2 -
+				print_post_trace;
+
+		seq_printf(m, "I-pipe frozen back-tracing service on %s/ipipe release #%d\n"
+			      "------------------------------------------------------------\n",
+			   UTS_RELEASE, IPIPE_CORE_RELEASE);
+		seq_printf(m, "CPU: %d, Freeze: %lld cycles, Trace Points: %d (+%d)\n",
+			cpu, print_path->point[print_path->begin].timestamp,
+			print_pre_trace+1, print_post_trace);
+		__ipipe_print_headline(m);
+	}
+
+	/* check if we are inside the trace range */
+	if (n >= print_pre_trace + 1 + print_post_trace)
+		return NULL;
+
+	/* return the next point to be shown */
+	return &print_path->point[WRAP_POINT_NO(print_path->begin-
+						print_pre_trace+n)];
+}
+
+static struct seq_operations __ipipe_frozen_ptrace_ops = {
+	.start = __ipipe_frozen_prtrace_start,
+	.next  = __ipipe_prtrace_next,
+	.stop  = __ipipe_prtrace_stop,
+	.show  = __ipipe_prtrace_show
+};
+
+static int __ipipe_frozen_prtrace_open(struct inode *inode, struct file *file)
+{
+	return seq_open(file, &__ipipe_frozen_ptrace_ops);
+}
+
+static ssize_t
+__ipipe_frozen_ctrl(struct file *file, const char __user *pbuffer,
+		    size_t count, loff_t *data)
+{
+	char *end, buf[16];
+	int val;
+	int n;
+
+	n = (count > sizeof(buf) - 1) ? sizeof(buf) - 1 : count;
+
+	if (copy_from_user(buf, pbuffer, n))
+		return -EFAULT;
+
+	buf[n] = '\0';
+	val = simple_strtol(buf, &end, 0);
+
+	if (((*end != '\0') && !isspace(*end)) || (val < 0))
+		return -EINVAL;
+
+	mutex_lock(&out_mutex);
+	ipipe_trace_frozen_reset();
+	if (val > 0)
+		ipipe_trace_freeze(-1);
+	mutex_unlock(&out_mutex);
+
+	return count;
+}
+
+static const struct file_operations __ipipe_frozen_prtrace_fops = {
+	.open	    = __ipipe_frozen_prtrace_open,
+	.read	    = seq_read,
+	.write	    = __ipipe_frozen_ctrl,
+	.llseek	    = seq_lseek,
+	.release    = seq_release,
+};
+
+static int __ipipe_rd_proc_val(struct seq_file *p, void *data)
+{
+	seq_printf(p, "%u\n", *(int *)p->private);
+	return 0;
+}
+
+static ssize_t
+__ipipe_wr_proc_val(struct file *file, const char __user *buffer,
+		    size_t count, loff_t *data)
+{
+	struct seq_file *p = file->private_data;
+	char *end, buf[16];
+	int val;
+	int n;
+
+	n = (count > sizeof(buf) - 1) ? sizeof(buf) - 1 : count;
+
+	if (copy_from_user(buf, buffer, n))
+		return -EFAULT;
+
+	buf[n] = '\0';
+	val = simple_strtol(buf, &end, 0);
+
+	if (((*end != '\0') && !isspace(*end)) || (val < 0))
+		return -EINVAL;
+
+	mutex_lock(&out_mutex);
+	*(int *)p->private = val;
+	mutex_unlock(&out_mutex);
+
+	return count;
+}
+
+static int __ipipe_rw_proc_val_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, __ipipe_rd_proc_val, PDE_DATA(inode));
+}
+
+static const struct file_operations __ipipe_rw_proc_val_ops = {
+	.open		= __ipipe_rw_proc_val_open,
+	.read		= seq_read,
+	.write		= __ipipe_wr_proc_val,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+static void __init
+__ipipe_create_trace_proc_val(struct proc_dir_entry *trace_dir,
+			      const char *name, int *value_ptr)
+{
+	proc_create_data(name, 0644, trace_dir, &__ipipe_rw_proc_val_ops,
+			 value_ptr);
+}
+
+static int __ipipe_rd_trigger(struct seq_file *p, void *data)
+{
+	char str[KSYM_SYMBOL_LEN];
+
+	if (trigger_begin) {
+		sprint_symbol(str, trigger_begin);
+		seq_printf(p, "%s\n", str);
+	}
+	return 0;
+}
+
+static ssize_t
+__ipipe_wr_trigger(struct file *file, const char __user *buffer,
+		   size_t count, loff_t *data)
+{
+	char buf[KSYM_SYMBOL_LEN];
+	unsigned long begin, end;
+
+	if (count > sizeof(buf) - 1)
+		count = sizeof(buf) - 1;
+	if (copy_from_user(buf, buffer, count))
+		return -EFAULT;
+	buf[count] = 0;
+	if (buf[count-1] == '\n')
+		buf[count-1] = 0;
+
+	begin = kallsyms_lookup_name(buf);
+	if (!begin || !kallsyms_lookup_size_offset(begin, &end, NULL))
+		return -ENOENT;
+	end += begin - 1;
+
+	mutex_lock(&out_mutex);
+	/* invalidate the current range before setting a new one */
+	trigger_end = 0;
+	wmb();
+	ipipe_trace_frozen_reset();
+
+	/* set new range */
+	trigger_begin = begin;
+	wmb();
+	trigger_end = end;
+	mutex_unlock(&out_mutex);
+
+	return count;
+}
+
+static int __ipipe_rw_trigger_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, __ipipe_rd_trigger, NULL);
+}
+
+static const struct file_operations __ipipe_rw_trigger_ops = {
+	.open		= __ipipe_rw_trigger_open,
+	.read		= seq_read,
+	.write		= __ipipe_wr_trigger,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+
+#ifdef CONFIG_IPIPE_TRACE_MCOUNT
+static void notrace
+ipipe_trace_function(unsigned long ip, unsigned long parent_ip,
+		     struct ftrace_ops *op, struct pt_regs *regs)
+{
+	if (!ipipe_trace_enable)
+		return;
+	__ipipe_trace(IPIPE_TRACE_FUNC, ip, parent_ip, 0);
+}
+
+static struct ftrace_ops ipipe_trace_ops = {
+	.func = ipipe_trace_function,
+	.flags = FTRACE_OPS_FL_IPIPE_EXCLUSIVE,
+};
+
+static ssize_t __ipipe_wr_enable(struct file *file, const char __user *buffer,
+				 size_t count, loff_t *data)
+{
+	char *end, buf[16];
+	int val;
+	int n;
+
+	n = (count > sizeof(buf) - 1) ? sizeof(buf) - 1 : count;
+
+	if (copy_from_user(buf, buffer, n))
+		return -EFAULT;
+
+	buf[n] = '\0';
+	val = simple_strtol(buf, &end, 0);
+
+	if (((*end != '\0') && !isspace(*end)) || (val < 0))
+		return -EINVAL;
+
+	mutex_lock(&out_mutex);
+
+	if (ipipe_trace_enable) {
+		if (!val)
+			unregister_ftrace_function(&ipipe_trace_ops);
+	} else if (val)
+		register_ftrace_function(&ipipe_trace_ops);
+
+	ipipe_trace_enable = val;
+
+	mutex_unlock(&out_mutex);
+
+	return count;
+}
+
+static const struct file_operations __ipipe_rw_enable_ops = {
+	.open		= __ipipe_rw_proc_val_open,
+	.read		= seq_read,
+	.write		= __ipipe_wr_enable,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+#endif /* CONFIG_IPIPE_TRACE_MCOUNT */
+
+extern struct proc_dir_entry *ipipe_proc_root;
+
+void __init __ipipe_tracer_hrclock_initialized(void)
+{
+	unsigned long long start, end, min = ULLONG_MAX;
+	int i;
+
+#ifdef CONFIG_IPIPE_TRACE_VMALLOC
+	if (!per_cpu(trace_path, 0))
+		return;
+#endif
+	/* Calculate minimum overhead of __ipipe_trace() */
+	hard_local_irq_disable();
+	for (i = 0; i < 100; i++) {
+		ipipe_read_tsc(start);
+		__ipipe_trace(IPIPE_TRACE_FUNC, CALLER_ADDR0,
+			      CALLER_ADDR1, 0);
+		ipipe_read_tsc(end);
+
+		end -= start;
+		if (end < min)
+			min = end;
+	}
+	hard_local_irq_enable();
+	trace_overhead = ipipe_tsc2ns(min);
+}
+
+void __init __ipipe_init_tracer(void)
+{
+	struct proc_dir_entry *trace_dir;
+#ifdef CONFIG_IPIPE_TRACE_VMALLOC
+	int cpu, path;
+#endif /* CONFIG_IPIPE_TRACE_VMALLOC */
+
+#ifdef CONFIG_IPIPE_TRACE_VMALLOC
+	for_each_possible_cpu(cpu) {
+		struct ipipe_trace_path *tp_buf;
+
+		tp_buf = vmalloc_node(sizeof(struct ipipe_trace_path) *
+				      IPIPE_TRACE_PATHS, cpu_to_node(cpu));
+		if (!tp_buf) {
+			pr_err("I-pipe: "
+			       "insufficient memory for trace buffer.\n");
+			return;
+		}
+		memset(tp_buf, 0,
+		       sizeof(struct ipipe_trace_path) * IPIPE_TRACE_PATHS);
+		for (path = 0; path < IPIPE_TRACE_PATHS; path++) {
+			tp_buf[path].begin = -1;
+			tp_buf[path].end   = -1;
+		}
+		per_cpu(trace_path, cpu) = tp_buf;
+	}
+#endif /* CONFIG_IPIPE_TRACE_VMALLOC */
+
+	if (__ipipe_hrclock_ok() && !trace_overhead)
+		__ipipe_tracer_hrclock_initialized();
+
+#ifdef CONFIG_IPIPE_TRACE_ENABLE
+	ipipe_trace_enable = 1;
+#ifdef CONFIG_IPIPE_TRACE_MCOUNT
+	ftrace_enabled = 1;
+	register_ftrace_function(&ipipe_trace_ops);
+#endif /* CONFIG_IPIPE_TRACE_MCOUNT */
+#endif /* CONFIG_IPIPE_TRACE_ENABLE */
+
+	trace_dir = proc_mkdir("trace", ipipe_proc_root);
+
+	proc_create("max", 0644, trace_dir, &__ipipe_max_prtrace_fops);
+	proc_create("frozen", 0644, trace_dir, &__ipipe_frozen_prtrace_fops);
+
+	proc_create("trigger", 0644, trace_dir, &__ipipe_rw_trigger_ops);
+
+	__ipipe_create_trace_proc_val(trace_dir, "pre_trace_points",
+				      &pre_trace);
+	__ipipe_create_trace_proc_val(trace_dir, "post_trace_points",
+				      &post_trace);
+	__ipipe_create_trace_proc_val(trace_dir, "back_trace_points",
+				      &back_trace);
+	__ipipe_create_trace_proc_val(trace_dir, "verbose",
+				      &verbose_trace);
+#ifdef CONFIG_IPIPE_TRACE_MCOUNT
+	proc_create_data("enable", 0644, trace_dir, &__ipipe_rw_enable_ops,
+			 &ipipe_trace_enable);
+#else /* !CONFIG_IPIPE_TRACE_MCOUNT */
+	__ipipe_create_trace_proc_val(trace_dir, "enable",
+				      &ipipe_trace_enable);
+#endif /* !CONFIG_IPIPE_TRACE_MCOUNT */
+
+#ifdef CONFIG_IPIPE_TRACE_PANIC
+	atomic_notifier_chain_register(&panic_notifier_list,
+				       &ipipe_trace_panic_notifier);
+	register_die_notifier(&ipipe_trace_die_notifier);
+#endif /* CONFIG_IPIPE_TRACE_PANIC */
+}
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index b76703b2c0af..fd476ca464e4 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -14,6 +14,7 @@
 #include <linux/interrupt.h>
 #include <linux/kernel_stat.h>
 #include <linux/irqdomain.h>
+#include <linux/ipipe.h>
 
 #include <trace/events/irq.h>
 
@@ -48,6 +49,10 @@ int irq_set_chip(unsigned int irq, struct irq_chip *chip)
 
 	if (!chip)
 		chip = &no_irq_chip;
+	else
+		WARN_ONCE(IS_ENABLED(CONFIG_IPIPE) &&
+			  (chip->flags & IRQCHIP_PIPELINE_SAFE) == 0,
+			  "irqchip %s is not pipeline-safe!", chip->name);
 
 	desc->irq_data.chip = chip;
 	irq_put_desc_unlock(desc, flags);
@@ -155,14 +160,6 @@ int irq_set_chip_data(unsigned int irq, void *data)
 }
 EXPORT_SYMBOL(irq_set_chip_data);
 
-struct irq_data *irq_get_irq_data(unsigned int irq)
-{
-	struct irq_desc *desc = irq_to_desc(irq);
-
-	return desc ? &desc->irq_data : NULL;
-}
-EXPORT_SYMBOL_GPL(irq_get_irq_data);
-
 static void irq_state_clr_disabled(struct irq_desc *desc)
 {
 	irqd_clear(&desc->irq_data, IRQD_IRQ_DISABLED);
@@ -242,9 +239,14 @@ static int __irq_startup(struct irq_desc *desc)
 	WARN_ON_ONCE(!irqd_is_activated(d));
 
 	if (d->chip->irq_startup) {
+		unsigned long flags = hard_cond_local_irq_save();
 		ret = d->chip->irq_startup(d);
 		irq_state_clr_disabled(desc);
 		irq_state_clr_masked(desc);
+		hard_cond_local_irq_restore(flags);
+#ifdef CONFIG_IPIPE
+		desc->istate &= ~IPIPE_IRQS_NEEDS_STARTUP;
+#endif
 	} else {
 		irq_enable(desc);
 	}
@@ -309,6 +311,9 @@ void irq_shutdown(struct irq_desc *desc)
 			desc->irq_data.chip->irq_shutdown(&desc->irq_data);
 			irq_state_set_disabled(desc);
 			irq_state_set_masked(desc);
+#ifdef CONFIG_IPIPE
+			desc->istate |= IPIPE_IRQS_NEEDS_STARTUP;
+#endif
 		} else {
 			__irq_disable(desc, true);
 		}
@@ -331,6 +336,8 @@ void irq_shutdown_and_deactivate(struct irq_desc *desc)
 
 void irq_enable(struct irq_desc *desc)
 {
+	unsigned long flags = hard_cond_local_irq_save();
+
 	if (!irqd_irq_disabled(&desc->irq_data)) {
 		unmask_irq(desc);
 	} else {
@@ -342,10 +349,14 @@ void irq_enable(struct irq_desc *desc)
 			unmask_irq(desc);
 		}
 	}
+
+	hard_cond_local_irq_restore(flags);
 }
 
 static void __irq_disable(struct irq_desc *desc, bool mask)
 {
+	unsigned long flags = hard_cond_local_irq_save();
+
 	if (irqd_irq_disabled(&desc->irq_data)) {
 		if (mask)
 			mask_irq(desc);
@@ -358,6 +369,8 @@ static void __irq_disable(struct irq_desc *desc, bool mask)
 			mask_irq(desc);
 		}
 	}
+
+	hard_cond_local_irq_restore(flags);
 }
 
 /**
@@ -387,11 +400,13 @@ void irq_disable(struct irq_desc *desc)
 
 void irq_percpu_enable(struct irq_desc *desc, unsigned int cpu)
 {
+	unsigned long flags = hard_cond_local_irq_save();
 	if (desc->irq_data.chip->irq_enable)
 		desc->irq_data.chip->irq_enable(&desc->irq_data);
 	else
 		desc->irq_data.chip->irq_unmask(&desc->irq_data);
 	cpumask_set_cpu(cpu, desc->percpu_enabled);
+	hard_cond_local_irq_restore(flags);
 }
 
 void irq_percpu_disable(struct irq_desc *desc, unsigned int cpu)
@@ -428,12 +443,16 @@ void mask_irq(struct irq_desc *desc)
 
 void unmask_irq(struct irq_desc *desc)
 {
+	unsigned long flags;
+
 	if (!irqd_irq_masked(&desc->irq_data))
 		return;
 
 	if (desc->irq_data.chip->irq_unmask) {
+		flags = hard_cond_local_irq_save();
 		desc->irq_data.chip->irq_unmask(&desc->irq_data);
 		irq_state_clr_masked(desc);
+		hard_cond_local_irq_restore(flags);
 	}
 }
 
@@ -630,7 +649,9 @@ static void cond_unmask_irq(struct irq_desc *desc)
 void handle_level_irq(struct irq_desc *desc)
 {
 	raw_spin_lock(&desc->lock);
+#ifndef CONFIG_IPIPE
 	mask_ack_irq(desc);
+#endif
 
 	if (!irq_may_run(desc))
 		goto out_unlock;
@@ -666,7 +687,16 @@ static inline void preflow_handler(struct irq_desc *desc)
 static inline void preflow_handler(struct irq_desc *desc) { }
 #endif
 
-static void cond_unmask_eoi_irq(struct irq_desc *desc, struct irq_chip *chip)
+#ifdef CONFIG_IPIPE
+static void cond_release_fasteoi_irq(struct irq_desc *desc,
+				     struct irq_chip *chip)
+{
+	if (chip->irq_release &&
+	    !irqd_irq_disabled(&desc->irq_data) && !desc->threads_oneshot)
+		chip->irq_release(&desc->irq_data);
+}
+#else
+static inline void cond_unmask_eoi_irq(struct irq_desc *desc, struct irq_chip *chip)
 {
 	if (!(desc->istate & IRQS_ONESHOT)) {
 		chip->irq_eoi(&desc->irq_data);
@@ -686,6 +716,7 @@ static void cond_unmask_eoi_irq(struct irq_desc *desc, struct irq_chip *chip)
 		chip->irq_eoi(&desc->irq_data);
 	}
 }
+#endif /* !CONFIG_IPIPE */
 
 /**
  *	handle_fasteoi_irq - irq handler for transparent controllers
@@ -718,13 +749,23 @@ void handle_fasteoi_irq(struct irq_desc *desc)
 	}
 
 	kstat_incr_irqs_this_cpu(desc);
+#ifndef CONFIG_IPIPE
 	if (desc->istate & IRQS_ONESHOT)
 		mask_irq(desc);
+#endif
 
 	preflow_handler(desc);
 	handle_irq_event(desc);
 
+#ifdef CONFIG_IPIPE
+	/*
+	 * IRQCHIP_EOI_IF_HANDLED is ignored as the I-pipe always
+	 * sends EOI.
+	 */
+	cond_release_fasteoi_irq(desc, chip);
+#else  /* !CONFIG_IPIPE */
 	cond_unmask_eoi_irq(desc, chip);
+#endif	/* !CONFIG_IPIPE */
 
 	raw_spin_unlock(&desc->lock);
 	return;
@@ -808,7 +849,9 @@ void handle_edge_irq(struct irq_desc *desc)
 	kstat_incr_irqs_this_cpu(desc);
 
 	/* Start handling the irq */
+#ifndef CONFIG_IPIPE
 	desc->irq_data.chip->irq_ack(&desc->irq_data);
+#endif
 
 	do {
 		if (unlikely(!desc->action)) {
@@ -900,6 +943,11 @@ void handle_percpu_irq(struct irq_desc *desc)
 	 */
 	__kstat_incr_irqs_this_cpu(desc);
 
+#ifdef CONFIG_IPIPE
+	(void)chip;
+	handle_irq_event_percpu(desc);
+	desc->ipipe_end(desc);
+#else
 	if (chip->irq_ack)
 		chip->irq_ack(&desc->irq_data);
 
@@ -907,6 +955,7 @@ void handle_percpu_irq(struct irq_desc *desc)
 
 	if (chip->irq_eoi)
 		chip->irq_eoi(&desc->irq_data);
+#endif
 }
 
 /**
@@ -933,13 +982,20 @@ void handle_percpu_devid_irq(struct irq_desc *desc)
 	 */
 	__kstat_incr_irqs_this_cpu(desc);
 
+#ifndef CONFIG_IPIPE
 	if (chip->irq_ack)
 		chip->irq_ack(&desc->irq_data);
+#endif
 
 	if (likely(action)) {
 		trace_irq_handler_entry(irq, action);
 		res = action->handler(irq, raw_cpu_ptr(action->percpu_dev_id));
 		trace_irq_handler_exit(irq, action, res);
+#ifdef CONFIG_IPIPE
+		(void)chip;
+		desc->ipipe_end(desc);
+		return;
+#endif
 	} else {
 		unsigned int cpu = smp_processor_id();
 		bool enabled = cpumask_test_cpu(cpu, desc->percpu_enabled);
@@ -980,6 +1036,170 @@ void handle_percpu_devid_fasteoi_nmi(struct irq_desc *desc)
 		chip->irq_eoi(&desc->irq_data);
 }
 
+#ifdef CONFIG_IPIPE
+
+void __ipipe_ack_level_irq(struct irq_desc *desc)
+{
+	mask_ack_irq(desc);
+}
+
+void __ipipe_end_level_irq(struct irq_desc *desc)
+{
+	desc->irq_data.chip->irq_unmask(&desc->irq_data);
+}
+
+void __ipipe_ack_fasteoi_irq(struct irq_desc *desc)
+{
+	desc->irq_data.chip->irq_hold(&desc->irq_data);
+}
+
+void __ipipe_end_fasteoi_irq(struct irq_desc *desc)
+{
+	if (desc->irq_data.chip->irq_release)
+		desc->irq_data.chip->irq_release(&desc->irq_data);
+}
+
+void __ipipe_ack_edge_irq(struct irq_desc *desc)
+{
+	desc->irq_data.chip->irq_ack(&desc->irq_data);
+}
+
+void __ipipe_ack_percpu_irq(struct irq_desc *desc)
+{
+	if (desc->irq_data.chip->irq_ack)
+		desc->irq_data.chip->irq_ack(&desc->irq_data);
+
+	if (desc->irq_data.chip->irq_eoi)
+		desc->irq_data.chip->irq_eoi(&desc->irq_data);
+}
+
+void __ipipe_nop_irq(struct irq_desc *desc)
+{
+}
+
+void __ipipe_chained_irq(struct irq_desc *desc)
+{
+	/*
+	 * XXX: Do NOT fold this into __ipipe_nop_irq(), see
+	 * ipipe_chained_irq_p().
+	 */
+}
+
+static void __ipipe_ack_bad_irq(struct irq_desc *desc)
+{
+	handle_bad_irq(desc);
+	WARN_ON_ONCE(1);
+}
+
+irq_flow_handler_t
+__ipipe_setup_irq_desc(struct irq_desc *desc, irq_flow_handler_t handle, int is_chained)
+{
+	if (unlikely(handle == NULL)) {
+		desc->ipipe_ack = __ipipe_ack_bad_irq;
+		desc->ipipe_end = __ipipe_nop_irq;
+	} else {
+		if (is_chained) {
+			desc->ipipe_ack = handle;
+			desc->ipipe_end = __ipipe_nop_irq;
+			handle = __ipipe_chained_irq;
+		} else if (handle == handle_simple_irq) {
+			desc->ipipe_ack = __ipipe_nop_irq;
+			desc->ipipe_end = __ipipe_nop_irq;
+		} else if (handle == handle_level_irq) {
+			desc->ipipe_ack = __ipipe_ack_level_irq;
+			desc->ipipe_end = __ipipe_end_level_irq;
+		} else if (handle == handle_edge_irq) {
+			desc->ipipe_ack = __ipipe_ack_edge_irq;
+			desc->ipipe_end = __ipipe_nop_irq;
+		} else if (handle == handle_fasteoi_irq) {
+			desc->ipipe_ack = __ipipe_ack_fasteoi_irq;
+			desc->ipipe_end = __ipipe_end_fasteoi_irq;
+		} else if (handle == handle_percpu_irq ||
+			   handle == handle_percpu_devid_irq) {
+			if (irq_desc_get_chip(desc) &&
+			    irq_desc_get_chip(desc)->irq_hold) {
+				desc->ipipe_ack = __ipipe_ack_fasteoi_irq;
+				desc->ipipe_end = __ipipe_end_fasteoi_irq;
+			} else {
+				desc->ipipe_ack = __ipipe_ack_percpu_irq;
+				desc->ipipe_end = __ipipe_nop_irq;
+			}
+		} else if (irq_desc_get_chip(desc) == &no_irq_chip) {
+			desc->ipipe_ack = __ipipe_nop_irq;
+			desc->ipipe_end = __ipipe_nop_irq;
+		} else {
+			desc->ipipe_ack = __ipipe_ack_bad_irq;
+			desc->ipipe_end = __ipipe_nop_irq;
+		}
+	}
+
+	/*
+	 * We don't cope well with lazy disabling simply because we
+	 * neither track nor update the descriptor state bits, which
+	 * is badly wrong.
+	 */
+	irq_settings_clr_and_set(desc, 0, _IRQ_DISABLE_UNLAZY);
+
+	/* Suppress intermediate trampoline routine. */
+	ipipe_root_domain->irqs[desc->irq_data.irq].ackfn = desc->ipipe_ack;
+
+	return handle;
+}
+
+int ipipe_enable_irq(unsigned int irq)
+{
+	struct irq_desc *desc;
+	struct irq_chip *chip;
+	unsigned long flags;
+	int err;
+
+	desc = irq_to_desc(irq);
+	if (desc == NULL)
+		return -EINVAL;
+
+	chip = irq_desc_get_chip(desc);
+
+	if (chip->irq_startup && (desc->istate & IPIPE_IRQS_NEEDS_STARTUP)) {
+
+		ipipe_root_only();
+
+		err = irq_activate(desc);
+		if (err)
+			return err;
+
+		raw_spin_lock_irqsave(&desc->lock, flags);
+		if (desc->istate & IPIPE_IRQS_NEEDS_STARTUP) {
+			desc->istate &= ~IPIPE_IRQS_NEEDS_STARTUP;
+			chip->irq_startup(&desc->irq_data);
+		}
+		raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+		return 0;
+	}
+
+	if (chip->irq_enable == NULL && chip->irq_unmask == NULL)
+		return -ENOSYS;
+
+	if (chip->irq_enable)
+		chip->irq_enable(&desc->irq_data);
+	else
+		chip->irq_unmask(&desc->irq_data);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ipipe_enable_irq);
+
+#else /* !CONFIG_IPIPE */
+
+irq_flow_handler_t
+__ipipe_setup_irq_desc(struct irq_desc *desc, irq_flow_handler_t handle, int is_chained)
+{
+	return handle;
+}
+
+#endif /* !CONFIG_IPIPE */
+EXPORT_SYMBOL_GPL(__ipipe_setup_irq_desc);
+
 static void
 __irq_do_set_handler(struct irq_desc *desc, irq_flow_handler_t handle,
 		     int is_chained, const char *name)
@@ -1014,6 +1234,8 @@ __irq_do_set_handler(struct irq_desc *desc, irq_flow_handler_t handle,
 			return;
 	}
 
+	handle = __ipipe_setup_irq_desc(desc, handle, is_chained);
+
 	/* Uninstall? */
 	if (handle == handle_bad_irq) {
 		if (desc->irq_data.chip != &no_irq_chip)
@@ -1349,6 +1571,20 @@ void irq_chip_mask_parent(struct irq_data *data)
 }
 EXPORT_SYMBOL_GPL(irq_chip_mask_parent);
 
+#ifdef CONFIG_IPIPE
+void irq_chip_hold_parent(struct irq_data *data)
+{
+	data = data->parent_data;
+	data->chip->irq_hold(data);
+}
+
+void irq_chip_release_parent(struct irq_data *data)
+{
+	data = data->parent_data;
+	data->chip->irq_release(data);
+}
+#endif
+
 /**
  * irq_chip_mask_ack_parent - Mask and acknowledge the parent interrupt
  * @data:	Pointer to interrupt specific data
diff --git a/kernel/irq/dummychip.c b/kernel/irq/dummychip.c
index 0b0cdf206dc4..7bf8cbee1b87 100644
--- a/kernel/irq/dummychip.c
+++ b/kernel/irq/dummychip.c
@@ -43,7 +43,7 @@ struct irq_chip no_irq_chip = {
 	.irq_enable	= noop,
 	.irq_disable	= noop,
 	.irq_ack	= ack_bad,
-	.flags		= IRQCHIP_SKIP_SET_WAKE,
+	.flags		= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_PIPELINE_SAFE,
 };
 
 /*
@@ -59,6 +59,6 @@ struct irq_chip dummy_irq_chip = {
 	.irq_ack	= noop,
 	.irq_mask	= noop,
 	.irq_unmask	= noop,
-	.flags		= IRQCHIP_SKIP_SET_WAKE,
+	.flags		= IRQCHIP_SKIP_SET_WAKE | IRQCHIP_PIPELINE_SAFE,
 };
 EXPORT_SYMBOL_GPL(dummy_irq_chip);
diff --git a/kernel/irq/generic-chip.c b/kernel/irq/generic-chip.c
index e2999a070a99..abb9d41475c7 100644
--- a/kernel/irq/generic-chip.c
+++ b/kernel/irq/generic-chip.c
@@ -37,12 +37,13 @@ void irq_gc_mask_disable_reg(struct irq_data *d)
 {
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
 	struct irq_chip_type *ct = irq_data_get_chip_type(d);
+	unsigned long flags;
 	u32 mask = d->mask;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	irq_reg_writel(gc, mask, ct->regs.disable);
 	*ct->mask_cache &= ~mask;
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 
 /**
@@ -56,12 +57,13 @@ void irq_gc_mask_set_bit(struct irq_data *d)
 {
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
 	struct irq_chip_type *ct = irq_data_get_chip_type(d);
+	unsigned long flags;
 	u32 mask = d->mask;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	*ct->mask_cache |= mask;
 	irq_reg_writel(gc, *ct->mask_cache, ct->regs.mask);
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 EXPORT_SYMBOL_GPL(irq_gc_mask_set_bit);
 
@@ -76,12 +78,13 @@ void irq_gc_mask_clr_bit(struct irq_data *d)
 {
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
 	struct irq_chip_type *ct = irq_data_get_chip_type(d);
+	unsigned long flags;
 	u32 mask = d->mask;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	*ct->mask_cache &= ~mask;
 	irq_reg_writel(gc, *ct->mask_cache, ct->regs.mask);
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 EXPORT_SYMBOL_GPL(irq_gc_mask_clr_bit);
 
@@ -96,12 +99,13 @@ void irq_gc_unmask_enable_reg(struct irq_data *d)
 {
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
 	struct irq_chip_type *ct = irq_data_get_chip_type(d);
+	unsigned long flags;
 	u32 mask = d->mask;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	irq_reg_writel(gc, mask, ct->regs.enable);
 	*ct->mask_cache |= mask;
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 
 /**
@@ -112,11 +116,12 @@ void irq_gc_ack_set_bit(struct irq_data *d)
 {
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
 	struct irq_chip_type *ct = irq_data_get_chip_type(d);
+	unsigned long flags;
 	u32 mask = d->mask;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	irq_reg_writel(gc, mask, ct->regs.ack);
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 EXPORT_SYMBOL_GPL(irq_gc_ack_set_bit);
 
@@ -128,11 +133,12 @@ void irq_gc_ack_clr_bit(struct irq_data *d)
 {
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
 	struct irq_chip_type *ct = irq_data_get_chip_type(d);
+	unsigned long flags;
 	u32 mask = ~d->mask;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	irq_reg_writel(gc, mask, ct->regs.ack);
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 
 /**
@@ -151,13 +157,14 @@ void irq_gc_mask_disable_and_ack_set(struct irq_data *d)
 {
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
 	struct irq_chip_type *ct = irq_data_get_chip_type(d);
+	unsigned long flags;
 	u32 mask = d->mask;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	irq_reg_writel(gc, mask, ct->regs.disable);
 	*ct->mask_cache &= ~mask;
 	irq_reg_writel(gc, mask, ct->regs.ack);
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 
 /**
@@ -168,11 +175,12 @@ void irq_gc_eoi(struct irq_data *d)
 {
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
 	struct irq_chip_type *ct = irq_data_get_chip_type(d);
+	unsigned long flags;
 	u32 mask = d->mask;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	irq_reg_writel(gc, mask, ct->regs.eoi);
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 }
 
 /**
@@ -187,17 +195,18 @@ void irq_gc_eoi(struct irq_data *d)
 int irq_gc_set_wake(struct irq_data *d, unsigned int on)
 {
 	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+	unsigned long flags;
 	u32 mask = d->mask;
 
 	if (!(mask & gc->wake_enabled))
 		return -EINVAL;
 
-	irq_gc_lock(gc);
+	flags = irq_gc_lock(gc);
 	if (on)
 		gc->wake_active |= mask;
 	else
 		gc->wake_active &= ~mask;
-	irq_gc_unlock(gc);
+	irq_gc_unlock(gc, flags);
 	return 0;
 }
 
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index c9d8eb7f5c02..0e385447d962 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -62,6 +62,7 @@ enum {
 	IRQS_SUSPENDED		= 0x00000800,
 	IRQS_TIMINGS		= 0x00001000,
 	IRQS_NMI		= 0x00002000,
+	IPIPE_IRQS_NEEDS_STARTUP= 0x80000000,
 };
 
 #include "debug.h"
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 9be995fc3c5a..a3798c24de3d 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -125,6 +125,9 @@ static void desc_set_defaults(unsigned int irq, struct irq_desc *desc, int node,
 	for_each_possible_cpu(cpu)
 		*per_cpu_ptr(desc->kstat_irqs, cpu) = 0;
 	desc_smp_init(desc, node, affinity);
+#ifdef CONFIG_IPIPE
+	desc->istate |= IPIPE_IRQS_NEEDS_STARTUP;
+#endif
 }
 
 int nr_irqs = NR_IRQS;
@@ -578,11 +581,13 @@ int __init early_irq_init(void)
 	return arch_early_irq_init();
 }
 
+#ifndef CONFIG_IPIPE
 struct irq_desc *irq_to_desc(unsigned int irq)
 {
 	return (irq < NR_IRQS) ? irq_desc + irq : NULL;
 }
 EXPORT_SYMBOL(irq_to_desc);
+#endif /* CONFIG_IPIPE */
 
 static void free_desc(unsigned int irq)
 {
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 3b1d0a4725a4..293e01658d63 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -959,9 +959,14 @@ again:
 
 	desc->threads_oneshot &= ~action->thread_mask;
 
+#ifndef CONFIG_IPIPE
 	if (!desc->threads_oneshot && !irqd_irq_disabled(&desc->irq_data) &&
 	    irqd_irq_masked(&desc->irq_data))
 		unmask_threaded_irq(desc);
+#else /* CONFIG_IPIPE */
+	if (!desc->threads_oneshot && !irqd_irq_disabled(&desc->irq_data))
+		desc->ipipe_end(desc);
+#endif /* CONFIG_IPIPE */
 
 out_unlock:
 	raw_spin_unlock_irq(&desc->lock);
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index eb95f6106a1e..0d3dc721bce2 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -268,6 +268,9 @@ static void msi_domain_update_chip_ops(struct msi_domain_info *info)
 	struct irq_chip *chip = info->chip;
 
 	BUG_ON(!chip || !chip->irq_mask || !chip->irq_unmask);
+	WARN_ONCE(IS_ENABLED(CONFIG_IPIPE) &&
+		  (chip->flags & IRQCHIP_PIPELINE_SAFE) == 0,
+		  "MSI domain irqchip %s is not pipeline-safe!", chip->name);
 	if (!chip->irq_set_affinity)
 		chip->irq_set_affinity = msi_domain_set_affinity;
 }
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index bca0f7f71cde..a5099743ba31 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3419,7 +3419,7 @@ void lockdep_hardirqs_on(unsigned long ip)
 	 * already enabled, yet we find the hardware thinks they are in fact
 	 * enabled.. someone messed up their IRQ state tracing.
 	 */
-	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled() && !hard_irqs_disabled()))
 		return;
 
 	/*
@@ -3446,7 +3446,9 @@ NOKPROBE_SYMBOL(lockdep_hardirqs_on);
  */
 void lockdep_hardirqs_off(unsigned long ip)
 {
-	struct task_struct *curr = current;
+	struct task_struct *curr;
+
+	curr = current;
 
 	if (unlikely(!debug_locks || current->lockdep_recursion))
 		return;
@@ -3455,7 +3457,7 @@ void lockdep_hardirqs_off(unsigned long ip)
 	 * So we're supposed to get called after you mask local IRQs, but for
 	 * some reason the hardware doesn't quite think you did a proper job.
 	 */
-	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled() && !hard_irqs_disabled()))
 		return;
 
 	if (curr->hardirqs_enabled) {
@@ -3485,7 +3487,7 @@ void trace_softirqs_on(unsigned long ip)
 	 * We fancy IRQs being disabled here, see softirq.c, avoids
 	 * funny state and nesting things.
 	 */
-	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled() && !hard_irqs_disabled()))
 		return;
 
 	if (curr->softirqs_enabled) {
@@ -3524,7 +3526,7 @@ void trace_softirqs_off(unsigned long ip)
 	/*
 	 * We fancy IRQs being disabled here, see softirq.c
 	 */
-	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled() && !hard_irqs_disabled()))
 		return;
 
 	if (curr->softirqs_enabled) {
diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h
index a525368b8cf6..ce2755b52821 100644
--- a/kernel/locking/lockdep_internals.h
+++ b/kernel/locking/lockdep_internals.h
@@ -202,12 +202,12 @@ extern struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
 	this_cpu_inc(lockdep_stats.ptr);
 
 #define debug_atomic_inc(ptr)			{		\
-	WARN_ON_ONCE(!irqs_disabled());				\
+	WARN_ON_ONCE(!hard_irqs_disabled() && !irqs_disabled()); \
 	__this_cpu_inc(lockdep_stats.ptr);			\
 }
 
 #define debug_atomic_dec(ptr)			{		\
-	WARN_ON_ONCE(!irqs_disabled());				\
+	WARN_ON_ONCE(!hard_irqs_disabled() && !irqs_disabled());\
 	__this_cpu_dec(lockdep_stats.ptr);			\
 }
 
diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index 0ff08380f531..7d7a34aa3e40 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -34,7 +34,9 @@ EXPORT_PER_CPU_SYMBOL(__mmiowb_state);
  * even on CONFIG_PREEMPT, because lockdep assumes that interrupts are
  * not re-enabled during lock-acquire (which the preempt-spin-ops do):
  */
-#if !defined(CONFIG_GENERIC_LOCKBREAK) || defined(CONFIG_DEBUG_LOCK_ALLOC)
+#if !defined(CONFIG_GENERIC_LOCKBREAK) ||			\
+	defined(CONFIG_DEBUG_LOCK_ALLOC) ||			\
+	defined(CONFIG_IPIPE)
 /*
  * The __lock_function inlines are taken from
  * spinlock : include/linux/spinlock_api_smp.h
diff --git a/kernel/module.c b/kernel/module.c
index 45513909b01d..a6f653f1ff5f 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1128,7 +1128,7 @@ bool try_module_get(struct module *module)
 	bool ret = true;
 
 	if (module) {
-		preempt_disable();
+		unsigned long flags = hard_preempt_disable();
 		/* Note: here, we can fail to get a reference */
 		if (likely(module_is_live(module) &&
 			   atomic_inc_not_zero(&module->refcnt) != 0))
@@ -1136,7 +1136,7 @@ bool try_module_get(struct module *module)
 		else
 			ret = false;
 
-		preempt_enable();
+		hard_preempt_enable(flags);
 	}
 	return ret;
 }
@@ -1147,11 +1147,11 @@ void module_put(struct module *module)
 	int ret;
 
 	if (module) {
-		preempt_disable();
+		unsigned long flags = hard_preempt_disable();
 		ret = atomic_dec_if_positive(&module->refcnt);
 		WARN_ON(ret < 0);	/* Failed to put refcount */
 		trace_module_put(module, _RET_IP_);
-		preempt_enable();
+		hard_preempt_enable(flags);
 	}
 }
 EXPORT_SYMBOL(module_put);
diff --git a/kernel/notifier.c b/kernel/notifier.c
index f6d5ffe4e72e..e1c427bfacc8 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -6,6 +6,7 @@
 #include <linux/rcupdate.h>
 #include <linux/vmalloc.h>
 #include <linux/reboot.h>
+#include <linux/ipipe.h>
 
 /*
  *	Notifier list for kernel code which wants to be called
@@ -195,6 +196,9 @@ NOKPROBE_SYMBOL(__atomic_notifier_call_chain);
 int atomic_notifier_call_chain(struct atomic_notifier_head *nh,
 			       unsigned long val, void *v)
 {
+	if (!ipipe_root_p)
+		return notifier_call_chain(&nh->head, val, v, -1, NULL);
+
 	return __atomic_notifier_call_chain(nh, val, v, -1, NULL);
 }
 EXPORT_SYMBOL_GPL(atomic_notifier_call_chain);
diff --git a/kernel/panic.c b/kernel/panic.c
index f470a038b05b..36d2c76de959 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -22,8 +22,10 @@
 #include <linux/ftrace.h>
 #include <linux/reboot.h>
 #include <linux/delay.h>
+#include <linux/ipipe_trace.h>
 #include <linux/kexec.h>
 #include <linux/sched.h>
+#include <linux/ipipe.h>
 #include <linux/sysrq.h>
 #include <linux/init.h>
 #include <linux/nmi.h>
@@ -513,6 +515,8 @@ void oops_enter(void)
 {
 	tracing_off();
 	/* can't trust the integrity of the kernel anymore: */
+	ipipe_trace_panic_freeze();
+	ipipe_disable_context_check();
 	debug_locks_off();
 	do_oops_enter_exit();
 }
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 69c4cd472def..5bd515d0ea33 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -290,6 +290,7 @@ static int create_image(int platform_mode)
 		goto Enable_cpus;
 
 	local_irq_disable();
+	hard_cond_local_irq_disable();
 
 	system_state = SYSTEM_SUSPEND;
 
@@ -457,6 +458,7 @@ static int resume_target_kernel(bool platform_mode)
 
 	local_irq_disable();
 	system_state = SYSTEM_SUSPEND;
+	hard_cond_local_irq_disable();
 
 	error = syscore_suspend();
 	if (error)
@@ -578,6 +580,7 @@ int hibernation_platform_enter(void)
 
 	local_irq_disable();
 	system_state = SYSTEM_SUSPEND;
+	hard_cond_local_irq_disable();
 	syscore_suspend();
 	if (pm_wakeup_pending()) {
 		error = -EAGAIN;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5569ef6bc183..5738ea4b5547 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -40,6 +40,7 @@
 #include <linux/kmsg_dump.h>
 #include <linux/syslog.h>
 #include <linux/cpu.h>
+#include <linux/ipipe.h>
 #include <linux/rculist.h>
 #include <linux/poll.h>
 #include <linux/irq_work.h>
@@ -2031,10 +2032,116 @@ asmlinkage int vprintk_emit(int facility, int level,
 }
 EXPORT_SYMBOL(vprintk_emit);
 
-asmlinkage int vprintk(const char *fmt, va_list args)
+#ifdef CONFIG_IPIPE
+
+extern int __ipipe_printk_bypass;
+
+static IPIPE_DEFINE_SPINLOCK(__ipipe_printk_lock);
+
+static int __ipipe_printk_fill;
+
+static char __ipipe_printk_buf[__LOG_BUF_LEN];
+
+int __ipipe_log_printk(const char *fmt, va_list args)
+{
+	int ret = 0, fbytes, oldcount;
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&__ipipe_printk_lock, flags);
+
+	oldcount = __ipipe_printk_fill;
+	fbytes = __LOG_BUF_LEN - oldcount;
+	if (fbytes > 1)	{
+		ret = vscnprintf(__ipipe_printk_buf + __ipipe_printk_fill,
+				 fbytes, fmt, args) + 1;
+		__ipipe_printk_fill += ret;
+	}
+
+	raw_spin_unlock_irqrestore(&__ipipe_printk_lock, flags);
+
+	if (oldcount == 0)
+		ipipe_raise_irq(__ipipe_printk_virq);
+
+	return ret;
+}
+
+static void do_deferred_vprintk(const char *fmt, ...)
+{
+	va_list args;
+
+	va_start(args, fmt);
+	vprintk_func(fmt, args);
+	va_end(args);
+}
+
+void __ipipe_flush_printk(unsigned virq, void *cookie)
+{
+	char *p = __ipipe_printk_buf;
+	int len, lmax, out = 0;
+	unsigned long flags;
+
+	goto start;
+	do {
+	raw_spin_unlock_irqrestore(&__ipipe_printk_lock, flags);
+start:
+		lmax = __ipipe_printk_fill;
+		while (out < lmax) {
+			len = strlen(p) + 1;
+			do_deferred_vprintk("%s", p);
+			p += len;
+			out += len;
+		}
+		raw_spin_lock_irqsave(&__ipipe_printk_lock, flags);
+	} while (__ipipe_printk_fill != lmax);
+
+	__ipipe_printk_fill = 0;
+
+	raw_spin_unlock_irqrestore(&__ipipe_printk_lock, flags);
+}
+
+static int do_vprintk(const char *fmt, va_list args)
+{
+	int sprintk = 1, cs = -1;
+	unsigned long flags;
+	int ret;
+
+	flags = hard_local_irq_save();
+
+	if (__ipipe_printk_bypass || oops_in_progress)
+		cs = ipipe_disable_context_check();
+	else if (__ipipe_current_domain == ipipe_root_domain) {
+		if (ipipe_head_domain != ipipe_root_domain &&
+		    (raw_irqs_disabled_flags(flags) ||
+		     test_bit(IPIPE_STALL_FLAG, &__ipipe_head_status)))
+			sprintk = 0;
+	} else
+		sprintk = 0;
+
+	hard_local_irq_restore(flags);
+
+	if (sprintk) {
+		ret = vprintk_func(fmt, args);
+		if (cs != -1)
+			ipipe_restore_context_check(cs);
+	} else
+		ret = __ipipe_log_printk(fmt, args);
+
+	return ret;
+}
+
+#else /* !CONFIG_IPIPE */
+
+static int do_vprintk(const char *fmt, va_list args)
 {
 	return vprintk_func(fmt, args);
 }
+
+#endif /* !CONFIG_IPIPE */
+
+asmlinkage int vprintk(const char *fmt, va_list args)
+{
+	return do_vprintk(fmt, args);
+}
 EXPORT_SYMBOL(vprintk);
 
 int vprintk_default(const char *fmt, va_list args)
@@ -2081,7 +2188,7 @@ asmlinkage __visible int printk(const char *fmt, ...)
 	int r;
 
 	va_start(args, fmt);
-	r = vprintk_func(fmt, args);
+	r = do_vprintk(fmt, args);
 	va_end(args);
 
 	return r;
@@ -2142,6 +2249,63 @@ asmlinkage __visible void early_printk(const char *fmt, ...)
 }
 #endif
 
+#ifdef CONFIG_RAW_PRINTK
+static struct console *raw_console;
+static IPIPE_DEFINE_RAW_SPINLOCK(raw_console_lock);
+
+void raw_vprintk(const char *fmt, va_list ap)
+{
+	unsigned long flags;
+	char buf[256];
+	int n;
+
+	if (raw_console == NULL || console_suspended)
+		return;
+
+	n = vscnprintf(buf, sizeof(buf), fmt, ap);
+        touch_nmi_watchdog();
+	raw_spin_lock_irqsave(&raw_console_lock, flags);
+	if (raw_console)
+		raw_console->write_raw(raw_console, buf, n);
+	raw_spin_unlock_irqrestore(&raw_console_lock, flags);
+}
+
+asmlinkage __visible void raw_printk(const char *fmt, ...)
+{
+	va_list ap;
+
+	va_start(ap, fmt);
+	raw_vprintk(fmt, ap);
+	va_end(ap);
+}
+EXPORT_SYMBOL(raw_printk);
+
+static inline void register_raw_console(struct console *newcon)
+{
+	if ((newcon->flags & CON_RAW) != 0 && newcon->write_raw)
+		raw_console = newcon;
+}
+
+static inline void unregister_raw_console(struct console *oldcon)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&raw_console_lock, flags);
+	if (oldcon == raw_console)
+		raw_console = NULL;
+	raw_spin_unlock_irqrestore(&raw_console_lock, flags);
+}
+
+#else
+
+static inline void register_raw_console(struct console *newcon)
+{ }
+
+static inline void unregister_raw_console(struct console *oldcon)
+{ }
+
+#endif
+
 static int __add_preferred_console(char *name, int idx, char *options,
 				   char *brl_options)
 {
@@ -2792,6 +2956,9 @@ void register_console(struct console *newcon)
 		console_drivers->next = newcon;
 	}
 
+	/* The latest raw console to register is current. */
+	register_raw_console(newcon);
+
 	if (newcon->flags & CON_EXTENDED)
 		nr_ext_console_drivers++;
 
@@ -2851,6 +3018,8 @@ int unregister_console(struct console *console)
 		(console->flags & CON_BOOT) ? "boot" : "" ,
 		console->name, console->index);
 
+	unregister_raw_console(console);
+
 	res = _braille_unregister_console(console);
 	if (res)
 		return res;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43d6179508d6..af34b69c0eed 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -835,6 +835,8 @@ static int ptrace_resume(struct task_struct *child, long request,
 		user_disable_single_step(child);
 	}
 
+	__ipipe_report_ptrace_resume(child, request);
+
 	/*
 	 * Change ->exit_code and ->state under siglock to avoid the race
 	 * with wait_task_stopped() in between; a non-zero ->exit_code will
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 4aa02eee8f6c..97216dbbdeee 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -6,7 +6,7 @@
 menu "RCU Debugging"
 
 config PROVE_RCU
-	def_bool PROVE_LOCKING
+	def_bool PROVE_LOCKING && !IPIPE
 
 config PROVE_RCU_LIST
 	bool "RCU list lockdep debugging"
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4511532b08b8..2afa946b6017 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1759,8 +1759,12 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 	}
 
 	/* Can the task run on the task's current CPU? If so, we're done */
-	if (cpumask_test_cpu(task_cpu(p), new_mask))
+	if (cpumask_test_cpu(task_cpu(p), new_mask)) {
+		__ipipe_report_setaffinity(p, task_cpu(p));
 		goto out;
+	}
+
+	__ipipe_report_setaffinity(p, dest_cpu);
 
 	if (task_running(rq, p) || p->state == TASK_WAKING) {
 		struct migration_arg arg = { p, dest_cpu };
@@ -2418,7 +2422,9 @@ void scheduler_ipi(void)
 	 * however a fair share of IPIs are still resched only so this would
 	 * somewhat pessimize the simple resched case.
 	 */
+#ifndef IPIPE_ARCH_HAVE_VIRQ_IPI
 	irq_enter();
+#endif
 	sched_ttwu_pending();
 
 	/*
@@ -2428,7 +2434,9 @@ void scheduler_ipi(void)
 		this_rq()->idle_balance = 1;
 		raise_softirq_irqoff(SCHED_SOFTIRQ);
 	}
+#ifndef IPIPE_ARCH_HAVE_VIRQ_IPI
 	irq_exit();
+#endif
 }
 
 static void ttwu_queue_remote(struct task_struct *p, int cpu, int wake_flags)
@@ -2634,7 +2642,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	 */
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	smp_mb__after_spinlock();
-	if (!(p->state & state))
+	if (!(p->state & state) || (p->state & (TASK_NOWAKEUP|TASK_HARDENING)))
 		goto unlock;
 
 	trace_sched_waking(p);
@@ -3402,6 +3410,7 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
 	 * PREEMPT_COUNT kernels).
 	 */
 
+	__ipipe_complete_domain_migration();
 	rq = finish_task_switch(prev);
 	balance_callback(rq);
 	preempt_enable();
@@ -3470,6 +3479,9 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	switch_to(prev, next, prev);
 	barrier();
 
+	if (unlikely(__ipipe_switch_tail()))
+		return NULL;
+
 	return finish_task_switch(prev);
 }
 
@@ -3962,6 +3974,7 @@ static noinline void __schedule_bug(struct task_struct *prev)
  */
 static inline void schedule_debug(struct task_struct *prev, bool preempt)
 {
+	ipipe_root_only();
 #ifdef CONFIG_SCHED_STACK_END_CHECK
 	if (task_stack_end_corrupted(prev))
 		panic("corrupted stack end detected inside scheduler\n");
@@ -4084,7 +4097,7 @@ restart:
  *
  * WARNING: must be called with preemption disabled!
  */
-static void __sched notrace __schedule(bool preempt)
+static bool __sched notrace __schedule(bool preempt)
 {
 	struct task_struct *prev, *next;
 	unsigned long *switch_count;
@@ -4165,12 +4178,17 @@ static void __sched notrace __schedule(bool preempt)
 
 		/* Also unlocks the rq: */
 		rq = context_switch(rq, prev, next, &rf);
+		if (rq == NULL)
+			return true; /* task hijacked by head domain */
 	} else {
+		prev->state &= ~TASK_HARDENING;
 		rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
 		rq_unlock_irq(rq, &rf);
 	}
 
 	balance_callback(rq);
+
+	return false;
 }
 
 void __noreturn do_task_dead(void)
@@ -4232,7 +4250,8 @@ asmlinkage __visible void __sched schedule(void)
 	sched_submit_work(tsk);
 	do {
 		preempt_disable();
-		__schedule(false);
+		if (__schedule(false))
+			return;
 		sched_preempt_enable_no_resched();
 	} while (need_resched());
 	sched_update_worker(tsk);
@@ -4313,7 +4332,8 @@ static void __sched notrace preempt_schedule_common(void)
 		 */
 		preempt_disable_notrace();
 		preempt_latency_start(1);
-		__schedule(true);
+		if (__schedule(true))
+			return;
 		preempt_latency_stop(1);
 		preempt_enable_no_resched_notrace();
 
@@ -4335,7 +4355,7 @@ asmlinkage __visible void __sched notrace preempt_schedule(void)
 	 * If there is a non-zero preempt_count or interrupts are disabled,
 	 * we do not want to preempt the current task. Just return..
 	 */
-	if (likely(!preemptible()))
+	if (likely(!preemptible() || !ipipe_root_p))
 		return;
 
 	preempt_schedule_common();
@@ -4361,7 +4381,7 @@ asmlinkage __visible void __sched notrace preempt_schedule_notrace(void)
 {
 	enum ctx_state prev_ctx;
 
-	if (likely(!preemptible()))
+	if (likely(!preemptible() || !ipipe_root_p || hard_irqs_disabled()))
 		return;
 
 	do {
@@ -5067,6 +5087,7 @@ change:
 
 	__setscheduler(rq, p, attr, pi);
 	__setscheduler_uclamp(p, attr);
+	__ipipe_report_setsched(p);
 
 	if (queued) {
 		/*
@@ -6631,6 +6652,43 @@ int in_sched_functions(unsigned long addr)
 		&& addr < (unsigned long)__sched_text_end);
 }
 
+#ifdef CONFIG_IPIPE
+
+int __ipipe_migrate_head(void)
+{
+	struct task_struct *p = current;
+
+	preempt_disable();
+
+	IPIPE_WARN_ONCE(__this_cpu_read(ipipe_percpu.task_hijacked) != NULL);
+
+	__this_cpu_write(ipipe_percpu.task_hijacked, p);
+	set_current_state(TASK_INTERRUPTIBLE | TASK_HARDENING);
+	sched_submit_work(p);
+	if (likely(__schedule(false)))
+		return 0;
+
+	preempt_enable();
+	return -ERESTARTSYS;
+}
+EXPORT_SYMBOL_GPL(__ipipe_migrate_head);
+
+void __ipipe_reenter_root(void)
+{
+	struct rq *rq;
+	struct task_struct *p;
+
+	p = __this_cpu_read(ipipe_percpu.rqlock_owner);
+	BUG_ON(p == NULL);
+	ipipe_clear_thread_flag(TIP_HEAD);
+	rq = finish_task_switch(p);
+	balance_callback(rq);
+	preempt_enable_no_resched_notrace();
+}
+EXPORT_SYMBOL_GPL(__ipipe_reenter_root);
+
+#endif /* CONFIG_IPIPE */
+
 #ifdef CONFIG_CGROUP_SCHED
 /*
  * Default task group.
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 131e7c86cf06..45114e62c792 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -77,22 +77,29 @@ void __weak arch_cpu_idle_dead(void) { }
 void __weak arch_cpu_idle(void)
 {
 	cpu_idle_force_poll = 1;
-	local_irq_enable();
+	local_irq_enable_full();
 }
 
 /**
  * default_idle_call - Default CPU idle routine.
  *
  * To use when the cpuidle framework cannot be used.
+ *
+ * When interrupts are pipelined, this call is entered with hard irqs
+ * on and the root stage stalled, returns with hard irqs on, and the
+ * root stage unstalled.
  */
 void __cpuidle default_idle_call(void)
 {
 	if (current_clr_polling_and_test()) {
-		local_irq_enable();
+		local_irq_enable_full();
 	} else {
-		stop_critical_timings();
-		arch_cpu_idle();
-		start_critical_timings();
+		if (ipipe_enter_cpuidle(NULL, NULL)) {
+			stop_critical_timings();
+			arch_cpu_idle();
+			start_critical_timings();
+		} else
+			local_irq_enable_full();
 	}
 }
 
@@ -208,6 +215,15 @@ static void cpuidle_idle_call(void)
 exit_idle:
 	__current_set_polling();
 
+#ifdef CONFIG_IPIPE
+	/*
+	 *  Catch mishandling of the CPU's interrupt disable flag when
+	 *  pipelining IRQs.
+	 */
+	if (WARN_ON_ONCE(hard_irqs_disabled()))
+		hard_local_irq_enable();
+#endif
+
 	/*
 	 * It is up to the idle functions to reenable local interrupts
 	 */
@@ -261,6 +277,9 @@ static void do_idle(void)
 			cpu_idle_poll();
 		} else {
 			cpuidle_idle_call();
+#ifdef CONFIG_IPIPE
+			WARN_ON_ONCE(hard_irqs_disabled());
+#endif
 		}
 		arch_cpu_idle_exit();
 	}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3e7590813844..ffd150846782 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -65,6 +65,7 @@
 #include <linux/syscalls.h>
 #include <linux/task_work.h>
 #include <linux/tsacct_kern.h>
+#include <linux/ipipe.h>
 
 #include <asm/tlb.h>
 
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index c1e566a114ca..13299c86bd87 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -80,6 +80,8 @@ static int __wake_up_common(struct wait_queue_head *wq_head, unsigned int mode,
 	} else
 		curr = list_first_entry(&wq_head->head, wait_queue_entry_t, entry);
 
+	ipipe_root_only();
+
 	if (&curr->entry == &wq_head->head)
 		return nr_exclusive;
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 8c97fc72d78b..0cc2d215f0a2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -35,6 +35,7 @@
 #include <linux/tracehook.h>
 #include <linux/capability.h>
 #include <linux/freezer.h>
+#include <linux/ipipe.h>
 #include <linux/pid_namespace.h>
 #include <linux/nsproxy.h>
 #include <linux/user_namespace.h>
@@ -760,6 +761,10 @@ still_pending:
 void signal_wake_up_state(struct task_struct *t, unsigned int state)
 {
 	set_tsk_thread_flag(t, TIF_SIGPENDING);
+
+	/* TIF_SIGPENDING must be prior to reporting. */
+	__ipipe_report_sigwake(t);
+
 	/*
 	 * TASK_WAKEKILL also means wake it up in the stopped/traced/killable
 	 * case. We don't check t->state here because there is a race with it
@@ -981,8 +986,11 @@ static inline bool wants_signal(int sig, struct task_struct *p)
 	if (sig == SIGKILL)
 		return true;
 
-	if (task_is_stopped_or_traced(p))
+	if (task_is_stopped_or_traced(p)) {
+		if (!signal_pending(p))
+			__ipipe_report_sigwake(p);
 		return false;
+	}
 
 	return task_curr(p) || !signal_pending(p);
 }
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 998d50ee2d9b..d93f9c9d63d1 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -237,6 +237,7 @@ static int multi_cpu_stop(void *data)
 		}
 	} while (curstate != MULTI_STOP_EXIT);
 
+	hard_irq_enable();
 	local_irq_restore(flags);
 	return err;
 }
@@ -618,6 +619,7 @@ int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data,
 		local_irq_save(flags);
 		hard_irq_disable();
 		ret = (*fn)(data);
+		hard_irq_enable();
 		local_irq_restore(flags);
 
 		return ret;
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index f5490222e134..ba06c41e901e 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -13,6 +13,7 @@
 #include <linux/module.h>
 #include <linux/smp.h>
 #include <linux/device.h>
+#include <linux/ipipe_tickdev.h>
 
 #include "tick-internal.h"
 
@@ -448,6 +449,8 @@ void clockevents_register_device(struct clock_event_device *dev)
 	/* Initialize state to DETACHED */
 	clockevent_set_state(dev, CLOCK_EVT_STATE_DETACHED);
 
+	ipipe_host_timer_register(dev);
+
 	if (!dev->cpumask) {
 		WARN_ON(num_possible_cpus() > 1);
 		dev->cpumask = cpumask_of(smp_processor_id());
@@ -642,8 +645,10 @@ void tick_cleanup_dead_cpu(int cpu)
 	 * Unregister the clock event devices which were
 	 * released from the users in the notify chain.
 	 */
-	list_for_each_entry_safe(dev, tmp, &clockevents_released, list)
+	list_for_each_entry_safe(dev, tmp, &clockevents_released, list) {
 		list_del(&dev->list);
+		ipipe_host_timer_cleanup(dev);
+	}
 	/*
 	 * Now check whether the CPU has left unused per cpu devices
 	 */
@@ -653,6 +658,7 @@ void tick_cleanup_dead_cpu(int cpu)
 		    !tick_is_broadcast_device(dev)) {
 			BUG_ON(!clockevent_state_detached(dev));
 			list_del(&dev->list);
+			ipipe_host_timer_cleanup(dev);
 		}
 	}
 	raw_spin_unlock_irqrestore(&clockevents_lock, flags);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index a3ae244b1bcd..e3196725dbbf 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -21,6 +21,7 @@
 #include <linux/kernel_stat.h>
 #include <linux/export.h>
 #include <linux/interrupt.h>
+#include <linux/ipipe.h>
 #include <linux/percpu.h>
 #include <linux/init.h>
 #include <linux/mm.h>
@@ -1724,6 +1725,15 @@ static inline int collect_expired_timers(struct timer_base *base,
 }
 #endif
 
+static inline void do_account_tick(struct task_struct *p, int user_tick)
+{
+#ifdef CONFIG_IPIPE
+	if (!__ipipe_root_tick_p(raw_cpu_ptr(&ipipe_percpu.tick_regs)))
+		return;
+#endif
+	account_process_tick(p, user_tick);
+}
+
 /*
  * Called from the timer interrupt handler to charge one tick to the current
  * process.  user_tick is 1 if the tick is user time, 0 for system.
@@ -1733,7 +1743,7 @@ void update_process_times(int user_tick)
 	struct task_struct *p = current;
 
 	/* Note: this timer irq context must be accounted for as well. */
-	account_process_tick(p, user_tick);
+	do_account_tick(p, user_tick);
 	run_local_timers();
 	rcu_sched_clock_irq(user_tick);
 #ifdef CONFIG_IRQ_WORK
diff --git a/kernel/time/vsyscall.c b/kernel/time/vsyscall.c
index 9577c89179cd..b51b410c92ca 100644
--- a/kernel/time/vsyscall.c
+++ b/kernel/time/vsyscall.c
@@ -9,6 +9,7 @@
 
 #include <linux/hrtimer.h>
 #include <linux/timekeeper_internal.h>
+#include <linux/ipipe_tickdev.h>
 #include <vdso/datapage.h>
 #include <vdso/helpers.h>
 #include <vdso/vsyscall.h>
@@ -73,6 +74,8 @@ void update_vsyscall(struct timekeeper *tk)
 	struct vdso_timestamp *vdso_ts;
 	u64 nsec;
 
+	ipipe_update_hostrt(tk);
+
 	/* copy vsyscall data */
 	vdso_write_begin(vdata);
 
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index f3f2fc8ad81a..98d7c64b7b8f 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -532,6 +532,7 @@ config DYNAMIC_FTRACE
 	bool "enable/disable function tracing dynamically"
 	depends on FUNCTION_TRACER
 	depends on HAVE_DYNAMIC_FTRACE
+	depends on !IPIPE
 	default y
 	help
 	  This option will modify all the calls to function tracing
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index fbba31baef53..742ccc45978f 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -35,6 +35,7 @@
 #include <linux/hash.h>
 #include <linux/rcupdate.h>
 #include <linux/kprobes.h>
+#include <linux/ipipe.h>
 
 #include <trace/events/sched.h>
 
@@ -184,8 +185,17 @@ static ftrace_func_t ftrace_ops_get_list_func(struct ftrace_ops *ops)
 
 static void update_ftrace_function(void)
 {
+	struct ftrace_ops *ops;
 	ftrace_func_t func;
 
+	for (ops = ftrace_ops_list;
+	     ops != &ftrace_list_end; ops = ops->next)
+		if (ops->flags & FTRACE_OPS_FL_IPIPE_EXCLUSIVE) {
+			set_function_trace_op = ops;
+			func = ops->func;
+			goto set_pointers;
+		}
+
 	/*
 	 * Prepare the ftrace_ops that the arch callback will use.
 	 * If there's only one ftrace_ops registered, the ftrace_ops_list
@@ -215,6 +225,7 @@ static void update_ftrace_function(void)
 
 	update_function_graph_func();
 
+  set_pointers:
 	/* If there's no change, then do nothing more here */
 	if (ftrace_trace_function == func)
 		return;
@@ -2611,6 +2622,9 @@ void __weak arch_ftrace_update_code(int command)
 
 static void ftrace_run_update_code(int command)
 {
+#ifdef CONFIG_IPIPE
+	unsigned long flags;
+#endif /* CONFIG_IPIPE */
 	int ret;
 
 	ret = ftrace_arch_code_modify_prepare();
@@ -5659,10 +5673,10 @@ static int ftrace_process_locs(struct module *mod,
 	 * reason to cause large interrupt latencies while we do it.
 	 */
 	if (!mod)
-		local_irq_save(flags);
+		flags = hard_local_irq_save();
 	ftrace_update_code(mod, start_pg);
 	if (!mod)
-		local_irq_restore(flags);
+		hard_local_irq_restore(flags);
 	ret = 0;
  out:
 	mutex_unlock(&ftrace_lock);
@@ -6203,9 +6217,11 @@ void __init ftrace_init(void)
 	unsigned long count, flags;
 	int ret;
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save_notrace();
 	ret = ftrace_dyn_arch_init();
-	local_irq_restore(flags);
+	hard_local_irq_restore_notrace(flags);
+
+	/* ftrace_dyn_arch_init places the return code in addr */
 	if (ret)
 		goto failed;
 
@@ -6340,7 +6356,16 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
 		}
 	} while_for_each_ftrace_op(op);
 out:
-	preempt_enable_notrace();
+#ifdef CONFIG_IPIPE
+	if (hard_irqs_disabled() || !ipipe_root_p)
+		/*
+		 * Nothing urgent to schedule here. At latest the timer tick
+		 * will pick up whatever the tracing functions kicked off.
+		 */
+		preempt_enable_no_resched_notrace();
+	else
+#endif
+		preempt_enable_notrace();
 	trace_clear_recursion(bit);
 }
 
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 6e5c6b023dc3..c91eedfedf99 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2719,6 +2719,7 @@ trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
 {
 	unsigned int val = cpu_buffer->current_context;
 	unsigned long pc = preempt_count();
+	unsigned long flags;
 	int bit;
 
 	if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
@@ -2727,6 +2728,8 @@ trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
 		bit = pc & NMI_MASK ? RB_CTX_NMI :
 			pc & HARDIRQ_MASK ? RB_CTX_IRQ : RB_CTX_SOFTIRQ;
 
+	flags = hard_local_irq_save();
+
 	if (unlikely(val & (1 << (bit + cpu_buffer->nest)))) {
 		/*
 		 * It is possible that this was called by transitioning
@@ -2734,21 +2737,29 @@ trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
 		 * been updated yet. In this case, use the TRANSITION bit.
 		 */
 		bit = RB_CTX_TRANSITION;
-		if (val & (1 << (bit + cpu_buffer->nest)))
+		if (val & (1 << (bit + cpu_buffer->nest))) {
+			hard_local_irq_restore(flags);
 			return 1;
+		}
 	}
 
 	val |= (1 << (bit + cpu_buffer->nest));
 	cpu_buffer->current_context = val;
 
+	hard_local_irq_restore(flags);
+
 	return 0;
 }
 
 static __always_inline void
 trace_recursive_unlock(struct ring_buffer_per_cpu *cpu_buffer)
 {
+	unsigned long flags;
+
+	flags = hard_local_irq_save();
 	cpu_buffer->current_context &=
 		cpu_buffer->current_context - (1 << cpu_buffer->nest);
+	hard_local_irq_restore(flags);
 }
 
 /* The recursive locking above uses 5 bits */
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2a357bda45cf..3e8f70bbc5b7 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3119,8 +3119,9 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
 	/* Don't pollute graph traces with trace_vprintk internals */
 	pause_graph_tracing();
 
+	flags = hard_local_irq_save();
+
 	pc = preempt_count();
-	preempt_disable_notrace();
 
 	tbuffer = get_trace_buf();
 	if (!tbuffer) {
@@ -3133,7 +3134,6 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
 	if (len > TRACE_BUF_SIZE/sizeof(int) || len < 0)
 		goto out;
 
-	local_save_flags(flags);
 	size = sizeof(*entry) + sizeof(u32) * len;
 	buffer = tr->trace_buffer.buffer;
 	event = __trace_buffer_lock_reserve(buffer, TRACE_BPRINT, size,
@@ -3154,7 +3154,7 @@ out:
 	put_trace_buf();
 
 out_nobuffer:
-	preempt_enable_notrace();
+	hard_local_irq_restore(flags);
 	unpause_graph_tracing();
 
 	return len;
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index aaf6793ededa..f274d9840b36 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -97,7 +97,7 @@ u64 notrace trace_clock_global(void)
 	int this_cpu;
 	u64 now;
 
-	raw_local_irq_save(flags);
+	flags = hard_local_irq_save_notrace();
 
 	this_cpu = raw_smp_processor_id();
 	now = sched_clock_cpu(this_cpu);
@@ -123,7 +123,7 @@ u64 notrace trace_clock_global(void)
 	arch_spin_unlock(&trace_clock_struct.lock);
 
  out:
-	raw_local_irq_restore(flags);
+	hard_local_irq_restore_notrace(flags);
 
 	return now;
 }
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index b611cd36e22d..57c1fc375b3b 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -190,7 +190,7 @@ function_stack_trace_call(unsigned long ip, unsigned long parent_ip,
 	 * Need to use raw, since this must be called before the
 	 * recursive protection is performed.
 	 */
-	local_irq_save(flags);
+	flags = hard_local_irq_save();
 	cpu = raw_smp_processor_id();
 	data = per_cpu_ptr(tr->trace_buffer.data, cpu);
 	disabled = atomic_inc_return(&data->disabled);
@@ -202,7 +202,7 @@ function_stack_trace_call(unsigned long ip, unsigned long parent_ip,
 	}
 
 	atomic_dec(&data->disabled);
-	local_irq_restore(flags);
+	hard_local_irq_restore(flags);
 }
 
 static struct tracer_opt func_opts[] = {
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 78af97163147..1acb41d810cd 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -169,7 +169,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace)
 	if (tracing_thresh)
 		return 1;
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save_notrace();
 	cpu = raw_smp_processor_id();
 	data = per_cpu_ptr(tr->trace_buffer.data, cpu);
 	disabled = atomic_inc_return(&data->disabled);
@@ -181,7 +181,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace)
 	}
 
 	atomic_dec(&data->disabled);
-	local_irq_restore(flags);
+	hard_local_irq_restore_notrace(flags);
 
 	return ret;
 }
@@ -250,7 +250,7 @@ void trace_graph_return(struct ftrace_graph_ret *trace)
 		return;
 	}
 
-	local_irq_save(flags);
+	flags = hard_local_irq_save_notrace();
 	cpu = raw_smp_processor_id();
 	data = per_cpu_ptr(tr->trace_buffer.data, cpu);
 	disabled = atomic_inc_return(&data->disabled);
@@ -259,7 +259,7 @@ void trace_graph_return(struct ftrace_graph_ret *trace)
 		__trace_graph_return(tr, trace, flags, pc);
 	}
 	atomic_dec(&data->disabled);
-	local_irq_restore(flags);
+	hard_local_irq_restore_notrace(flags);
 }
 
 void set_graph_array(struct trace_array *tr)
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
index 26b06b09c9f6..c9e53d3bd0bb 100644
--- a/kernel/trace/trace_preemptirq.c
+++ b/kernel/trace/trace_preemptirq.c
@@ -21,6 +21,9 @@ static DEFINE_PER_CPU(int, tracing_irq_cpu);
 
 void trace_hardirqs_on(void)
 {
+	if (!ipipe_root_p)
+		return;
+
 	if (this_cpu_read(tracing_irq_cpu)) {
 		if (!in_nmi())
 			trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
@@ -35,6 +38,9 @@ NOKPROBE_SYMBOL(trace_hardirqs_on);
 
 void trace_hardirqs_off(void)
 {
+	if (!ipipe_root_p)
+		return;
+
 	if (!this_cpu_read(tracing_irq_cpu)) {
 		this_cpu_write(tracing_irq_cpu, 1);
 		tracer_hardirqs_off(CALLER_ADDR0, CALLER_ADDR1);
@@ -49,6 +55,9 @@ NOKPROBE_SYMBOL(trace_hardirqs_off);
 
 __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
 {
+	if (!ipipe_root_p)
+		return;
+
 	if (this_cpu_read(tracing_irq_cpu)) {
 		if (!in_nmi())
 			trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
@@ -61,8 +70,33 @@ __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
 EXPORT_SYMBOL(trace_hardirqs_on_caller);
 NOKPROBE_SYMBOL(trace_hardirqs_on_caller);
 
+__visible void trace_hardirqs_on_virt_caller(unsigned long ip)
+{
+	/*
+	 * The IRQ tracing logic only applies to the root domain, and
+	 * must consider the virtual disable flag exclusively when
+	 * leaving an interrupt/fault context.
+	 */
+	if (ipipe_root_p && !irqs_disabled())
+		trace_hardirqs_on_caller(ip);
+}
+
+__visible void trace_hardirqs_on_virt(void)
+{
+	/*
+	 * The IRQ tracing logic only applies to the root domain, and
+	 * must consider the virtual disable flag exclusively when
+	 * leaving an interrupt/fault context.
+	 */
+	if (ipipe_root_p && !irqs_disabled())
+		trace_hardirqs_on_caller(CALLER_ADDR0);
+}
+
 __visible void trace_hardirqs_off_caller(unsigned long caller_addr)
 {
+	if (!ipipe_root_p)
+		return;
+
 	lockdep_hardirqs_off(CALLER_ADDR0);
 
 	if (!this_cpu_read(tracing_irq_cpu)) {
@@ -80,14 +114,14 @@ NOKPROBE_SYMBOL(trace_hardirqs_off_caller);
 
 void trace_preempt_on(unsigned long a0, unsigned long a1)
 {
-	if (!in_nmi())
+	if (ipipe_root_p && !in_nmi())
 		trace_preempt_enable_rcuidle(a0, a1);
 	tracer_preempt_on(a0, a1);
 }
 
 void trace_preempt_off(unsigned long a0, unsigned long a1)
 {
-	if (!in_nmi())
+	if (ipipe_root_p && !in_nmi())
 		trace_preempt_disable_rcuidle(a0, a1);
 	tracer_preempt_off(a0, a1);
 }
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ee00c6c8a373..4a85f0a594e3 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -406,6 +406,7 @@ config MAGIC_SYSRQ
 	  keys are documented in <file:Documentation/admin-guide/sysrq.rst>.
 	  Don't say Y unless you really know what this hack does.
 
+
 config MAGIC_SYSRQ_DEFAULT_ENABLE
 	hex "Enable magic SysRq key functions by default"
 	depends on MAGIC_SYSRQ
@@ -425,6 +426,8 @@ config MAGIC_SYSRQ_SERIAL
 	  This option allows you to decide whether you want to enable the
 	  magic SysRq key.
 
+source "kernel/ipipe/Kconfig.debug"
+
 config DEBUG_KERNEL
 	bool "Kernel debugging"
 	help
diff --git a/lib/atomic64.c b/lib/atomic64.c
index e98c85a99787..9676c025cd53 100644
--- a/lib/atomic64.c
+++ b/lib/atomic64.c
@@ -25,15 +25,15 @@
  * Ensure each lock is in a separate cacheline.
  */
 static union {
-	raw_spinlock_t lock;
+	ipipe_spinlock_t lock;
 	char pad[L1_CACHE_BYTES];
 } atomic64_lock[NR_LOCKS] __cacheline_aligned_in_smp = {
 	[0 ... (NR_LOCKS - 1)] = {
-		.lock =  __RAW_SPIN_LOCK_UNLOCKED(atomic64_lock.lock),
+		.lock =  IPIPE_SPIN_LOCK_UNLOCKED,
 	},
 };
 
-static inline raw_spinlock_t *lock_addr(const atomic64_t *v)
+static inline ipipe_spinlock_t *lock_addr(const atomic64_t *v)
 {
 	unsigned long addr = (unsigned long) v;
 
@@ -45,7 +45,7 @@ static inline raw_spinlock_t *lock_addr(const atomic64_t *v)
 s64 atomic64_read(const atomic64_t *v)
 {
 	unsigned long flags;
-	raw_spinlock_t *lock = lock_addr(v);
+	ipipe_spinlock_t *lock = lock_addr(v);
 	s64 val;
 
 	raw_spin_lock_irqsave(lock, flags);
@@ -58,7 +58,7 @@ EXPORT_SYMBOL(atomic64_read);
 void atomic64_set(atomic64_t *v, s64 i)
 {
 	unsigned long flags;
-	raw_spinlock_t *lock = lock_addr(v);
+	ipipe_spinlock_t *lock = lock_addr(v);
 
 	raw_spin_lock_irqsave(lock, flags);
 	v->counter = i;
@@ -70,7 +70,7 @@ EXPORT_SYMBOL(atomic64_set);
 void atomic64_##op(s64 a, atomic64_t *v)				\
 {									\
 	unsigned long flags;						\
-	raw_spinlock_t *lock = lock_addr(v);				\
+	ipipe_spinlock_t *lock = lock_addr(v);				\
 									\
 	raw_spin_lock_irqsave(lock, flags);				\
 	v->counter c_op a;						\
@@ -82,7 +82,7 @@ EXPORT_SYMBOL(atomic64_##op);
 s64 atomic64_##op##_return(s64 a, atomic64_t *v)			\
 {									\
 	unsigned long flags;						\
-	raw_spinlock_t *lock = lock_addr(v);				\
+	ipipe_spinlock_t *lock = lock_addr(v);				\
 	s64 val;							\
 									\
 	raw_spin_lock_irqsave(lock, flags);				\
@@ -96,7 +96,7 @@ EXPORT_SYMBOL(atomic64_##op##_return);
 s64 atomic64_fetch_##op(s64 a, atomic64_t *v)				\
 {									\
 	unsigned long flags;						\
-	raw_spinlock_t *lock = lock_addr(v);				\
+	ipipe_spinlock_t *lock = lock_addr(v);				\
 	s64 val;							\
 									\
 	raw_spin_lock_irqsave(lock, flags);				\
@@ -133,7 +133,7 @@ ATOMIC64_OPS(xor, ^=)
 s64 atomic64_dec_if_positive(atomic64_t *v)
 {
 	unsigned long flags;
-	raw_spinlock_t *lock = lock_addr(v);
+	ipipe_spinlock_t *lock = lock_addr(v);
 	s64 val;
 
 	raw_spin_lock_irqsave(lock, flags);
@@ -148,7 +148,7 @@ EXPORT_SYMBOL(atomic64_dec_if_positive);
 s64 atomic64_cmpxchg(atomic64_t *v, s64 o, s64 n)
 {
 	unsigned long flags;
-	raw_spinlock_t *lock = lock_addr(v);
+	ipipe_spinlock_t *lock = lock_addr(v);
 	s64 val;
 
 	raw_spin_lock_irqsave(lock, flags);
@@ -163,7 +163,7 @@ EXPORT_SYMBOL(atomic64_cmpxchg);
 s64 atomic64_xchg(atomic64_t *v, s64 new)
 {
 	unsigned long flags;
-	raw_spinlock_t *lock = lock_addr(v);
+	ipipe_spinlock_t *lock = lock_addr(v);
 	s64 val;
 
 	raw_spin_lock_irqsave(lock, flags);
@@ -177,7 +177,7 @@ EXPORT_SYMBOL(atomic64_xchg);
 s64 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
 {
 	unsigned long flags;
-	raw_spinlock_t *lock = lock_addr(v);
+	ipipe_spinlock_t *lock = lock_addr(v);
 	s64 val;
 
 	raw_spin_lock_irqsave(lock, flags);
diff --git a/lib/bust_spinlocks.c b/lib/bust_spinlocks.c
index 8be59f84eaea..812d63bd8e66 100644
--- a/lib/bust_spinlocks.c
+++ b/lib/bust_spinlocks.c
@@ -16,6 +16,7 @@
 #include <linux/wait.h>
 #include <linux/vt_kern.h>
 #include <linux/console.h>
+#include <linux/ipipe_trace.h>
 
 void bust_spinlocks(int yes)
 {
diff --git a/lib/dump_stack.c b/lib/dump_stack.c
index 33ffbf308853..c20f90a9e490 100644
--- a/lib/dump_stack.c
+++ b/lib/dump_stack.c
@@ -8,6 +8,7 @@
 #include <linux/export.h>
 #include <linux/sched.h>
 #include <linux/sched/debug.h>
+#include <linux/ipipe.h>
 #include <linux/smp.h>
 #include <linux/atomic.h>
 #include <linux/kexec.h>
@@ -56,6 +57,9 @@ void dump_stack_print_info(const char *log_lvl)
 		printk("%sHardware name: %s\n",
 		       log_lvl, dump_stack_arch_desc_str);
 
+#ifdef CONFIG_IPIPE
+	printk("I-pipe domain: %s\n", ipipe_current_domain->name);
+#endif
 	print_worker_info(log_lvl, current);
 }
 
@@ -85,6 +89,29 @@ static void __dump_stack(void)
 #ifdef CONFIG_SMP
 static atomic_t dump_lock = ATOMIC_INIT(-1);
 
+static unsigned long disable_local_irqs(void)
+{
+	unsigned long flags = 0; /* only to trick the UMR detection */
+
+	/*
+	 * We neither need nor want to disable root stage IRQs over
+	 * the head stage, where CPU migration can't
+	 * happen. Conversely, we neither need nor want to disable
+	 * hard IRQs from the head stage, so that latency won't
+	 * skyrocket as a result of dumping the stack backtrace.
+	 */
+	if (ipipe_root_p)
+		local_irq_save(flags);
+
+	return flags;
+}
+
+static void restore_local_irqs(unsigned long flags)
+{
+	if (ipipe_root_p)
+		local_irq_restore(flags);
+}
+
 asmlinkage __visible void dump_stack(void)
 {
 	unsigned long flags;
@@ -97,7 +124,7 @@ asmlinkage __visible void dump_stack(void)
 	 * against other CPUs
 	 */
 retry:
-	local_irq_save(flags);
+	flags = disable_local_irqs();
 	cpu = smp_processor_id();
 	old = atomic_cmpxchg(&dump_lock, -1, cpu);
 	if (old == -1) {
@@ -105,7 +132,7 @@ retry:
 	} else if (old == cpu) {
 		was_locked = 1;
 	} else {
-		local_irq_restore(flags);
+		restore_local_irqs(flags);
 		/*
 		 * Wait for the lock to release before jumping to
 		 * atomic_cmpxchg() in order to mitigate the thundering herd
@@ -120,7 +147,7 @@ retry:
 	if (!was_locked)
 		atomic_set(&dump_lock, -1);
 
-	local_irq_restore(flags);
+	restore_local_irqs(flags);
 }
 #else
 asmlinkage __visible void dump_stack(void)
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 0a2ffadc6d71..b45416b23760 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -11,6 +11,7 @@
 #include <linux/sched.h>
 #include <linux/io.h>
 #include <linux/export.h>
+#include <linux/hardirq.h>
 #include <asm/cacheflush.h>
 #include <asm/pgtable.h>
 
@@ -227,7 +228,12 @@ int ioremap_page_range(unsigned long addr,
 			break;
 	} while (pgd++, phys_addr += (next - addr), addr = next, addr != end);
 
-	flush_cache_vmap(start, end);
+	/* APEI may invoke this for temporarily remapping pages in interrupt
+	 * context - nothing we can and need to propagate globally. */
+	if (!in_interrupt()) {
+		__ipipe_pin_mapping_globally(start, end);
+		flush_cache_vmap(start, end);
+	}
 
 	return err;
 }
diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
index 60ba93fc42ce..a5cefa08500e 100644
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -7,12 +7,19 @@
 #include <linux/export.h>
 #include <linux/kprobes.h>
 #include <linux/sched.h>
+#include <linux/ipipe.h>
 
 notrace static nokprobe_inline
 unsigned int check_preemption_disabled(const char *what1, const char *what2)
 {
 	int this_cpu = raw_smp_processor_id();
 
+	if (hard_irqs_disabled())
+		goto out;
+
+	if (!ipipe_root_p)
+		goto out;
+
 	if (likely(preempt_count()))
 		goto out;
 
diff --git a/mm/memory.c b/mm/memory.c
index 2157bb28117a..51cffe9fb7bb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -56,6 +56,7 @@
 #include <linux/export.h>
 #include <linux/delayacct.h>
 #include <linux/init.h>
+#include <linux/ipipe.h>
 #include <linux/pfn_t.h>
 #include <linux/writeback.h>
 #include <linux/memcontrol.h>
@@ -4737,6 +4738,41 @@ long copy_huge_page_from_user(struct page *dst_page,
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
 
+#ifdef CONFIG_IPIPE
+
+int __ipipe_disable_ondemand_mappings(struct task_struct *tsk)
+{
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+	int result = 0;
+
+	mm = get_task_mm(tsk);
+	if (!mm)
+		return -EPERM;
+
+	down_write(&mm->mmap_sem);
+	if (test_bit(MMF_VM_PINNED, &mm->flags))
+		goto done_mm;
+
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		if (is_cow_mapping(vma->vm_flags) &&
+		    (vma->vm_flags & VM_WRITE)) {
+			result = __ipipe_pin_vma(mm, vma);
+			if (result < 0)
+				goto done_mm;
+		}
+	}
+	set_bit(MMF_VM_PINNED, &mm->flags);
+
+  done_mm:
+	up_write(&mm->mmap_sem);
+	mmput(mm);
+	return result;
+}
+EXPORT_SYMBOL_GPL(__ipipe_disable_ondemand_mappings);
+
+#endif /* CONFIG_IPIPE */
+
 #if USE_SPLIT_PTE_PTLOCKS && ALLOC_SPLIT_PTLOCKS
 
 static struct kmem_cache *page_ptl_cachep;
diff --git a/mm/mlock.c b/mm/mlock.c
index a72c1eeded77..a10e73ab97ee 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -871,3 +871,29 @@ void user_shm_unlock(size_t size, struct user_struct *user)
 	spin_unlock(&shmlock_user_lock);
 	free_uid(user);
 }
+
+#ifdef CONFIG_IPIPE
+int __ipipe_pin_vma(struct mm_struct *mm, struct vm_area_struct *vma)
+{
+	unsigned int gup_flags = 0;
+	int ret, len;
+
+	if (vma->vm_flags & (VM_IO | VM_PFNMAP))
+		return 0;
+
+	if (!((vma->vm_flags & VM_DONTEXPAND) ||
+	    is_vm_hugetlb_page(vma) || vma == get_gate_vma(mm))) {
+		ret = populate_vma_page_range(vma, vma->vm_start, vma->vm_end,
+					      NULL);
+		return ret < 0 ? ret : 0;
+	}
+
+	if ((vma->vm_flags & (VM_WRITE | VM_SHARED)) == VM_WRITE)
+		gup_flags |= FOLL_WRITE;
+	len = DIV_ROUND_UP(vma->vm_end, PAGE_SIZE) - vma->vm_start/PAGE_SIZE;
+	ret = get_user_pages_locked(vma->vm_start, len, gup_flags, NULL, NULL);
+	if (ret < 0)
+		return ret;
+	return ret == len ? 0 : -EFAULT;
+}
+#endif
diff --git a/mm/mmu_context.c b/mm/mmu_context.c
index a1da47e02747..ab150be0e88e 100644
--- a/mm/mmu_context.c
+++ b/mm/mmu_context.c
@@ -9,6 +9,7 @@
 #include <linux/sched/task.h>
 #include <linux/mmu_context.h>
 #include <linux/export.h>
+#include <linux/ipipe.h>
 
 #include <asm/mmu_context.h>
 
@@ -23,10 +24,12 @@ void use_mm(struct mm_struct *mm)
 {
 	struct mm_struct *active_mm;
 	struct task_struct *tsk = current;
+	unsigned long flags;
 
 	task_lock(tsk);
 	/* Hold off tlb flush IPIs while switching mm's */
 	local_irq_disable();
+	ipipe_mm_switch_protect(flags);
 	active_mm = tsk->active_mm;
 	if (active_mm != mm) {
 		mmgrab(mm);
@@ -34,6 +37,7 @@ void use_mm(struct mm_struct *mm)
 	}
 	tsk->mm = mm;
 	switch_mm_irqs_off(active_mm, mm, tsk);
+	ipipe_mm_switch_unprotect(flags);
 	local_irq_enable();
 	task_unlock(tsk);
 #ifdef finish_arch_post_lock_switch
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 95dee88f782b..0cdc97eaead3 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -22,6 +22,7 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 #include <linux/mmu_notifier.h>
+#include <linux/ipipe.h>
 #include <linux/migrate.h>
 #include <linux/perf_event.h>
 #include <linux/pkeys.h>
@@ -41,7 +42,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 {
 	pte_t *pte, oldpte;
 	spinlock_t *ptl;
-	unsigned long pages = 0;
+	unsigned long pages = 0, flags;
 	int target_node = NUMA_NO_NODE;
 
 	/*
@@ -109,6 +110,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 					continue;
 			}
 
+			flags = hard_local_irq_save();
 			oldpte = ptep_modify_prot_start(vma, addr, pte);
 			ptent = pte_modify(oldpte, newprot);
 			if (preserve_write)
@@ -121,6 +123,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 				ptent = pte_mkwrite(ptent);
 			}
 			ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
+			hard_local_irq_restore(flags);
 			pages++;
 		} else if (IS_ENABLED(CONFIG_MIGRATION)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);
@@ -338,6 +341,12 @@ unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,
 		pages = hugetlb_change_protection(vma, start, end, newprot);
 	else
 		pages = change_protection_range(vma, start, end, newprot, dirty_accountable, prot_numa);
+#ifdef CONFIG_IPIPE
+	if (test_bit(MMF_VM_PINNED, &vma->vm_mm->flags) &&
+	    ((vma->vm_flags | vma->vm_mm->def_flags) & VM_LOCKED) &&
+	    (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)))
+		__ipipe_pin_vma(vma->vm_mm, vma);
+#endif
 
 	return pages;
 }
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5797e1eeaa7e..80dff9b8d391 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -238,6 +238,8 @@ static int vmap_page_range_noflush(unsigned long start, unsigned long end,
 			return err;
 	} while (pgd++, addr = next, addr != end);
 
+	__ipipe_pin_mapping_globally(start, end);
+
 	return nr;
 }
 
