Rahul Sharma (Editor)

InfinityDB

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

InfinityDB is an all-Java embedded database engine with an extended java.util.concurrent.ConcurrentNavigableMap interface (a subinterface of java.util.Map) that is deployed in handheld devices, on servers, on workstations, and in distributed settings. The design is based on a proprietary lockless, concurrent, B-Tree architecture that enables client programmers to reach high levels of performance without risk of failures. Data is stored to and retrieved from a single embedded database file using the InfnityDB API that allows direct access to the variable length item spaces. Database client programmers can construct traditional relations as well as specialized models that directly satisfy the needs of the dependent application. There is no limit to the number of items, database size, or JVM size, so InfinityDB can function in both the smallest environment that provides random access storage and can be scaled to large settings. Traditional relations and specialized models can be directed to the same database file. InfinityDB can be optimized for standard relations as well as all other types of data, allowing client applications to perform at a minimum of one million operations per second on a virtual, 8-core system.

Contents

AirConcurrentMap, is an in-memory map that implements the Java ConcurrentMap interface, but internally it uses a multi-core design so that its performance and memory make it the fastest Java Map when ordering is performed and it holds medium to large numbers of entries. AirConcurrentMap iteration is faster than any Java Map iterators, regardless of the specific map type.

Map API

InfinityDB can be accessed as an extended standard java.util.concurrent.ConcurrentNavigableMap, or via a low-level 'ItemSpace' API. The ConcurrentNavigableMap interface is a subinterface of java.util.Map but has special ordering and concurrency methods: this is the same API as java.util.concurrent.ConcurrentSkipListMap. Maps may be nested to form complex structures. The Maps have the standard semantics, but work internally on a 'tuple space', while the Maps are not actually stored but are helpers, each representing nothing more than an immutable tuple prefix. Maps may be created dynamically at high speed if needed for access, and are thread-safe and multi-core concurrent. The key and value types available include all Java primitive data types, Dates, Strings, small char or byte arrays, 'ByteStrings', 'huge array' indexes, Character Long Objects or Binary Long Objects, plus the special-purpose types 'EntityClass' and 'Attribute'. Maps may be multi-valued. Applications may choose to use the Map-based access alone and may mix in lower-level 'ItemSpace' access over the same tuples, as the Map access is just a wrapper and there is no tuple-level distinction.

The lower level 'ItemSpace' Data Model

The data primitives are called 'components' and they are atomic. Components can be concatenated into short composites called 'Items' which are the unit of storage and retrieval. Higher-level structures that combine these Items include unlimited size records of an unlimited number of columns or attributes, with complex attribute values of unlimited size. Keys may be a composition of components. Attribute values can be ordered sets of composite components, character large objects (CLOB's), binary large objects (BLOB's), or unlimited sparse arrays. Other higher-level structures built of multiple Items include key/value associations like ordered maps, ordered sets, Entity-Attribute-Value nets of quadruples, trees, DAG's, taxonomies, or full-text indexes. Mixtures of these can occur along with custom client-defined structures.

Data Encoding

An 'ItemSpace' represents the entire database, and it is a simple ordered set of Items, with no other state. An Item is actually stored with each component encoded in variable-length binary form in a char array, with components being self-describing in a standard format which sorts correctly. Programmers deal with the components only as primitives, and the stored data is strongly typed. Data is not stored as text to be parsed with weak typing as in JSON or XML, nor is it parsed out of programmer-defined binary stream representations. There are no custom client-devised binary formats that can grow brittle, and which can have security, documentation, upgrade, testing, versioning, scaling, and debugging problems, such as is the case with Java Object serialization.

Performance Scaling

All access to the system is via a few basic methods that can store or retrieve in order one variable-length 'Item' or 'tuple' at a time at a speed which is on the order of 1M operations/second aggregated over multiple threads when in memory. The operations are either the standard Map API for get(), put(), iterators, and so on, or at the lower level, insert(), delete(), update(), first(), next(), last(), and previous(). Typical Items are about 30 bytes uncompressed in memory, but LOB's for example use 1KB Items. Because each operation affects only one Item, small data structures are fast to access. This is in contrast to chunked access, such as for example formatting and parsing entire JSON or XML texts or entire Java Object serialization graphs. The space and performance scaling of an ItemSpace is smooth as any size of client-imposed multi-Item structure is created, grows, shrinks, or disappears. On-storage performance is like any block-oriented B-Tree, with blocks of about 4KB, which is O(log(n)) per access. There is a block cache of 2.5MB by default, which is of unlimited size but which is often about 100MB. The cache grows only as needed.

Space Scaling

For performance and efficiency, the Items are stored inside a single B-Tree prefix-compressed and variable length as an uninterpreted sequence of bytes for further compression. The B-Tree may typically grow to the 100GB's range but has no limits. There is only one file, so there is no log or other files to write to and flush. InfinityDB minimizes the size of its database file through four types of compression (prefix, suffix, zlib, and UTF-8).

Schemaless Upgrade and Downgrade

schema upgrade when structures are extended or modified is done by adding or removing Items in new ways at runtime, and there are no upgrade scripts - hence the data model is No-SQL and schemaless. Because an empty or non-existent piece of data requires no space by virtue of having no Items associated with it, it is possible to extend the structure simply by adding Items of new kinds. If Items are removed, all logical and physical space is immediately reclaimed.

For example, backwards compatibility with older databases in a relational model while making a single-valued attribute into a multi-value attribute can be achieved simply by storing more values (one per Item) alongside the existing one without deleting the existing one, since every attribute is already logically multi-value and is logically composite. There is little code change and no internal or external database structure change. If a value is deleted and it is the last value of a multi-value attribute, all of its space is recovered immediately. In other words, every attribute can logically store an ordered set of optionally composite values, but there is no cost for the generality in common cases, and no cost at all for null values.

If a new table is to be added, it is created at the moment its first row is inserted. The first row appears only when the first attribute value is added to it. Forwards compatibility with future databases is possible because future databases may have new tables or any other structure that can be discovered and used, can be simply ignored or can be handled opaquely by older code.

Transactionality

Both global 'ACD' and per-thread 'ACID' transactions are provided. Each InfinityDB instance stores data into a single database file and does not require additional log or rollback files. In the event of any catastrophe except power failure or other hardware malfunction, the database is guaranteed to be consistent with the status as of completion of the last global commit. Recovery after abrupt termination is immediate and requires no slow log replay. Bulk loading is globally transactional with unlimited data size, and is concurrent with all other uses. Global transactions do not provide inter-thread isolation, so the semantics are 'ACD' (rather than 'ACID'). Alternatively, ACID transactions employ optimistic locking to allow inter-thread isolation.

Immediate Garbage Collection

As data structures grow and shrink, freed space is reclaimed immediately and made available for other structures. Systems can run indefinitely without gradual space leaks or long interruptions during garbage reclamation phases. When a data structure becomes empty, all of its space is recycled, rather than leaving behind 'tombstones' or other place holders. For example, a possibly very large multi-value attribute may shrink to one value, becoming as efficient as any single-valued attribute, and if that last value is deleted, all space for it is reclaimed, including the space for the attribute it was attached to, and if a row has only attributes with no values, the row is reclaimed completely as well. If a table loses all of its rows, the space for the table is reclaimed. Any size or type of data structure has this property. There are no reference counters, hence any type of graph is incrementally collected automatically.

Products

InfinityDB Version 4.0 features:

  • Extended API for the Map interface that allows a 'tuple' organized underlying persistent structure to be manipulated by nested in-memory Maps which are not stored.
  • Extended data primitive types
  • 'Engine level' API for faster access to the supporting 'ItemsSpace' data model.
  • Compression in persistent storage.
  • AirConcurrentMap Version 3 features:

  • Fastest in-memory Java Map when concurrency and ordering is used on Maps that are medium to large in size. This is patent-pending.
  • Better memory efficiency than all standard Java library Maps above about 1K Entries.
  • A proprietary parallel Map scan is faster than that for Java 1.8.
  • forEach is faster than for Java 1.8 Maps.
  • For both products:

  • Concurrent, multi-threaded processing on multiple cores without locks increases performance on multi-core platforms, such as the Intel i7 by approximately seven times. Both products are patent pending.
  • Standard Java Map access is used. An enhanced java.util.concurrent.ConcurrentNavigable interface is implemented, allowing direct substitution into any existing application or test code. This interface provides specialized concurrency methods as well as ordering features improving the original SortedSet.
  • History

    Roger L. Deran designed and developed the Infinity Database Engine over 20 years ago and holds US Patent 5283894. The Infinity Database Engine was first deployed in Intel 8088 assembly language in the ROSCOR sports video editor (RSVE) that was licensed to NFL teams in the 1980s. Lexicon purchased the RSVE in 1989, and greatly expanded its deployment to all types of professional and college sports. Java version 2.0 added transactionality, and version 3.0 added concurrency features which are patent pending, and apply to InfinityDB as well as AirConcurrentMap. Infinity DB remains in active use in thousands of sites of various kinds, while AirConcurrentMap is new.

    Uses of the all-JAVA InfinityDB, marketed by Boiler Bay Inc. since 2002, include:

  • consolidation of pharmaceutical and medical data
  • collection, description, consolidation, and sharing of ornithological data
  • representation of taxonomies of various types
  • programming environment tools such as source code repository navigation
  • text indexers
  • email consolidation systems
  • distributed industrial data collection systems.
  • References

    InfinityDB Wikipedia