Dynamic software updating

Updated on Dec 20, 2024

Edit

Comment

In computer science, dynamic software updating (DSU) is a field of research pertaining to upgrading programs while they are running. DSU is not currently widely used in industry. However, researchers have developed a wide variety of systems and techniques for implementing DSU. These systems are commonly tested on real-world programs.

Current operating systems and programming languages are typically not designed with DSU in mind. As such, DSU implementations commonly either utilize existing tools, or implement specialty compilers. These compilers preserve the semantics of the original program, but instrument either the source code or object code to produce a dynamically updateable program. Researchers compare DSU-capable variants of programs to the original program to assess safety and performance overhead.

Introduction

Any running program can be thought of a tuple ( δ , P ) , where δ is the current program state and P is the current program code. Dynamic software updating systems transform a running program ( δ , P ) to a new version ( δ ′ , P ′ ) . In order to do this, the state must be transformed into the representation P ′ expects. This requires a state transformer function. Thus, DSU transforms a program ( δ , P ) to ( S ( δ ) , P ′ ) . An update is considered valid if and only if the running program ( S ( δ ) , P ′ ) can be reduced to a point tuple ( δ , P ′ ) that is reachable from the starting point of the new version of the program, ( δ i n i t , P ′ ) .

The location in a program where a dynamic update occurs is referred to as an update point. Existing DSU implementations vary widely in their treatment of update points. In some systems, such as UpStare and PoLUS, an update can occur at any time during execution. Ginseng's compiler will attempt to infer good locations for update points, but can also use programmer-specified update points. Kitsune and Ekiden require developers to manually specify and name all update points.

Updating systems differ in the types of program changes that they support. For example, Ksplice only supports code changes in functions, and does not support changes to state representation. This is because Ksplice primarily targets security changes, rather than general updates. In contrast, Ekiden can update a program to any other program capable of being executed, even one written in a different programming language. Systems designers can extract valuable performance or safety assurances by limiting the scope of updates. For example, any update safety check limits the scope of updates to updates which pass that safety check. The mechanism used to transform code and state influences what kinds of updates a system will support.

DSU systems, as tools, can also be evaluated on their ease-of-use and clarity to developers. Many DSU systems, such as Ginseng, require programs to pass various static analyses. While these analyses prove properties of programs that are valuable for DSU, they are by nature sophisticated and difficult to understand. DSU systems that do not use a static analysis might require use of a specialized compiler. Some DSU systems require neither static analysis nor specialty compilers.

Programs that are updated by a DSU system are referred to as target programs. Academic publications of DSU systems commonly include several target programs as case studies. vsftpd, OpenSSH, PostgreSQL, Tor, Apache, GNU Zebra, memcached, and Redis are all dynamic updating targets for various systems. Since few programs are written with support for dynamic updating in mind, retrofitting existing programs is a valuable means of evaluating a DSU system for practical use.

The problem space addressed by dynamic updating can be thought of as an intersection of several others. Examples include checkpointing, dynamic linking, and persistence. As an example, a database that must be backward-compatible with previous versions of its on-disk file format, must accomplish the same type of state transformation expected of a dynamic updating system. Likewise, a program that has a plugin architecture, must be able to load and execute new code at runtime.

Similar techniques are sometimes also employed for the purpose of dynamic dead-code elimination to remove conditionally dead or unreachable code at load or runtime, and recombine the remaining code to minimize its memory footprint or improve speed.

History

The earliest precursor to dynamic software updating is redundant systems. In a redundant environment, spare systems exist ready to take control of active computations in the event of a failure of the main system. These systems contain a main machine and a hot spare. The hot spare would be periodically seeded with a checkpoint of the primary system. In the event of a failure, the hot spare would take over, and the main machine would become the new hot spare. This pattern can be generalized to updating. In the event of an update, the hot spare would activate, the main system would update, and then the updated system would resume control.

The earliest true Dynamic Software Updating system is DYMOS (Dynamic Modification System). Presented in 1983 in the PhD dissertation of Insup Lee, DYMOS was a fully integrated system that had access to an interactive user interface, a compiler and runtime for a Modula variant, and source code. This enabled DYMOS to type-check updates against the existing program.

Implementation

DSU systems must load new code into a running program, and transform existing state into a format that is understood by the new code. Since many motivational use-cases of DSU are time-critical (for example, deploying a security fix on a live and vulnerable system), DSU systems must provide adequate update availability. Some DSU systems also attempt to ensure that updates are safe before applying them.

There is no one canonical solution to any of these problems. Typically, a DSU system that performs well in one problem area does so at a trade-off to others. For example, empirical testing of dynamic updates indicates that increasing the number of update points results in an increased number of unsafe updates.

Code transformation

Most DSU systems use subroutines as the unit of code for updates; however, newer DSU systems implement whole-program updates.

If the target program is implemented in a virtual machine language, the VM can use existing infrastructure to load new code, since modern virtual machines support runtime loading for other use cases besides DSU (mainly debugging). The HotSpot JVM supports runtime code loading, and DSU systems targeting Java (programming language) can utilize this feature.

In native languages such as C or C++, DSU systems can use specialized compilers that insert indirection into the program. At update time, this indirection is updated to point to the newest version. If a DSU system does not use a compiler to insert these indirections statically, it insert them at runtime with binary rewriting. Binary rewriting is the process of writing low-level code into the memory image of a running native program to re-direct functions. While this requires no static analysis of a program, it is highly platform-dependent.

Ekiden and Kitsune load new program code via starting an entirely new program, either through fork-exec or dynamic loading. The existing program state is then transferred to the new program space.

State transformation

During an update, program state must be transformed from the original representation to the new version's representation. This is referred to as state transformation. A function which transforms a state object or group of objects is referred to as a transformer function or state transformer.

DSU systems can either attempt to synthesize transformer functions, or require that the developer manually supply them. Some systems mix these approaches, inferring some elements of transformers while requiring developer input on others.

These transformer functions can either be applied to program state lazily, as each piece of old-version state is accessed, or eagerly, transforming all state at update time. Lazy transformation ensures that the update will complete in constant time, but also incurs steady-state overhead on object access. Eager transformation incurs more expense at the time of update, requiring the system to stop the world while all transformers run. However, eager transformation allows compilers to fully optimize state access, avoiding the steady-state overhead involved with lazy transformation.

Update safety

Most DSU systems attempt to show some safety properties for updates. The most common variant of safety checking is type safety, where an update is considered safe if it does not result in any new code operating on an old state representation, or vice versa.

Type safety is typically checked by showing one of two properties, activeness safety or cons-freeness safety. A program is considered activeness-safe if no updated function exists on the call stack at update time. This proves safety because control can never return to old code that would access new representations of data.

Cons-Freeness is another way to prove type safety, where a section of code is considered safe if it does not access state of a given type in a way that requires knowledge of the type representation. This code can be said to not access the state concretely, while it may access the state abstractly. It is possible to prove or disprove cons-freeness for all types in any section of code, and the DSU system Ginseng uses this to prove type safety. If a function is proven cons-free, it can be updated even if it is live on the stack, since it will not cause a type error by accessing state using the old representation.

Empirical analysis of cons-freeness and activeness safety by Hayden et all show that both techniques permit most correct updates and deny most erroneous updates. However, manually selecting update points results in zero update errors, and still allows frequent update availability.

DYMOS

DYMOS is notable in that it was the earliest proposed DSU system. DYMOS consists of a fully integrated environment for programs written in a derivative of Modula, giving the system access to a command interpreter, source code, compiler, and runtime environment, similar to a REPL. In DYMOS, updates are initiated by a user executing a command in the interactive environment. This command includes directives specifying when an update can occur, called when-conditions. The information available to DYMOS enables it to enforce type-safety of updates with respect to the running target program.

Ksplice, kpatch and kGraft

Ksplice is a DSU system that targets only the Linux kernel, making itself one of the specialized DSU systems that support an operating system kernel as the target program. Ksplice uses source-level diffs to determine changes between current and updated versions of the Linux kernel, and then uses binary rewriting to insert the changes into the running kernel. Ksplice was maintained by a commercial venture founded by its original authors, Ksplice Inc., which was acquired by Oracle Corporation in July 2011. Ksplice is used on a commercial basis and exclusively in the Oracle Linux distribution.

SUSE developed kGraft as an open-source alternative for live kernel patching, and Red Hat did likewise with kpatch. They both allow function-level changes to be applied to a running Linux kernel, while relying on live patching mechanisms established by ftrace. The primary difference between kGraft and kpatch is the way they ensure runtime consistency of the updated code sections while hot patches are applied. kGraft and kpatch were submitted for inclusion into the Linux kernel mainline in April 2014 and May 2014, respectively, and the minimalistic foundations for live patching were merged into the Linux kernel mainline in kernel version 4.0, which was released on April 12, 2015.

Since April 2015, there is ongoing work on porting kpatch and kGraft to the common live patching core provided by the Linux kernel mainline. However, implementation of the function-level consistency mechanisms, required for safe transitions between the original and patched versions of functions, has been delayed because the call stacks provided by the Linux kernel may be unreliable in situations that involve assembly code without proper stack frames; as a result, the porting work remains in progress as of September 2015. In an attempt to improve the reliability of kernel's call stacks, a specialized sanity-check stacktool userspace utility has also been developed with the purpose of checking kernel's compile-time object files and ensuring that the call stack is always maintained; it also opens up a possibility for achieving more reliable call stacks as part of the kernel oops messages.

Ginseng

Ginseng is a general-purpose DSU system. It is the only DSU system to use the cons-freeness safety technique, allowing it to update functions that are live on the stack as long as they do not make concrete accesses to updated types.

Ginseng is implemented as a source-to-source compiler written using the C Intermediate Language framework in OCaml. This compiler inserts indirection to all function calls and type accesses, enabling Ginseng to lazily transform state at the cost of imposing a constant-time overhead for the entirety of the program execution. Ginseng's compiler proves the cons-freeness properties of the entire initial program and of dynamic patches.

Later versions of Ginseng also support a notion of transactional safety. This allows developers to annotate a sequence of function calls as a logical unit, preventing updates from violating program semantics in ways that are not detectable by either activeness safety or cons-freeness safety. For example, in two versions of OpenSSH examined by Ginseng's authors, important user verification code was moved between two functions called in sequence. If the first version of the first function executed, an update occurred, and the new version of the second function was executed, then the verification would never be performed. Marking this section as a transaction ensures that an update will not prevent the verification from occurring.

UpStare

UpStare is a DSU system that uses a unique updating mechanism, stack reconstruction. To update a program with UpStare, a developer specifies a mapping between any possible stack frames. UpStare is able to use this mapping to immediately update the program at any point, with any number of threads, and with any functions live on the stack.

PoLUS

PoLUS is a binary-rewriting DSU system for C. It is able to update unmodified programs at any point in their execution. To update functions, it rewrites the prelude to a target function to redirect to a new function, chaining these redirections over multiple versions. This avoids steady-state overhead in functions that have not been updated.

Katana

Katana is a research system that provides limited dynamic updating (similar to Ksplice and its forks) for user-mode ELF binaries. The Katana patching model operates on the level of ELF objects, and thus has the capacity to be language-agnostic as long as the compilation target is ELF.

Kitsune and Ekiden

Ekiden and Kitsune are two variants of a single DSU system that implements the state-transfer style of DSU for programs written in C. Rather than updating functions within a single program, Ekiden and Kitsune perform updates over whole programs, transferring necessary state between the two executions. While Ekiden accomplishes this by starting a new program using the UNIX idiom of fork-exec, serializing the target program's state, and transferring it, Kitsune uses dynamic linking to perform "in-place" state transfer. Kitsune is derived from Ekiden's codebase, and can be considered a later version of Ekiden.

Ekiden and Kitsune are also notable in that they are implemented primarily as application-level libraries, rather than specialized runtimes or compilers. As such, to use Ekiden or Kitsune, an application developer must manually mark state that is to be transferred, and manually select points in the program where an update can occur. To ease this process, Kitsune includes a specialized compiler that implements a domain-specific language for writing state transformers.

Erlang

Erlang supports Dynamic Software Updating, though this is commonly referred to as "hot code loading". Erlang requires no safety guarantees on updates, but Erlang culture suggests that developers write in a defensive style that will gracefully handle type errors generated by updating.

Pymoult

Pymoult is a prototyping platform for dynamic update written in Python. It gathers many techniques from other systems, allowing their combination and configuration. The objective of this platform is to let developers chose the update techniques they find more appropriate for their needs. For example, one can combine lazy update of the state as in Ginseng while changing the whole code of the application as in Kitsune or Ekiden.

References

Dynamic software updating Wikipedia

(Text) CC BY-SA

Contents