Jackson Ramnarayan On-Memory Data Grid – TheMoneyOffice

In-memory databases or in-memory data grids (IMDGs) have been gaining more attention recently due to their support for dynamic calibration and higher performance for data intensive applications than traditional databases. Some products in this space include Oracle Coherence, IBM’s WebSphere Extreme Scale and Database XC10, Red Hot’s Japos Infinispan, VMware’s Gemfire and Terracotta’s Echo Pigmeri.

InfoQ spoke with Jacques Ramnarayan, chief architect of Gemfair Products at VMWare, about the structure of memory-based data grids, their advantages over traditional data stores, and the growing trend in this space.Information: How the structure of Database Grid (IMDG) differs in structure from traditional database?

Jacques Ramnarayan: Traditional affiliated databases are designed with good principles of data life, privacy and freedom, but the design is centralized and tends to achieve “disk IO” for high writing disorders. The need to lock locks at various levels (row, page, table position) and the state of transaction status on disk for stability / separation introduces measurable challenges, allowing users to measure “beepier” machines (vertical scaling) Should use for.

Data is phased in memory (as the name suggests) mainly to avoid costly disk searches for items in data memory. Focus disk switch (data in one step) from optimization to data management on a network. The measurement is provided by reflection (for slow but often requested data access) and segmentation (for volume data). Data transfers are managed on multiple nodes simultaneously for protection from failures and, in some advanced data stages, data can be copied asynchronously with WAN for disaster recovery.

You might think that data grids share caches with many important features. For example, the data grid can be integrated into related databases by providing read out of box ‘integration services such as “read up”, transaction “right by” and asynchronous module “right to background”.

Mostly, IMDGs complement traditional databases. The most common design method is a “distributed cache” for one or more databases. Therefore, compared to a traditional database, you will get better data access, fewer and unexpected delays, significantly higher write performance and generally higher usability. You can link your data sources by moving your database-specific logic to JavaScript running on the database and not based on the availability of the data sources (running application). Allows Java to run the grid.

How do IMDGs differ from distributed caching solutions such as Memcache? Distributed caching products like Memecache are actually superior in simplicity, high performance, key value store and size in memory. Secret “scaling” is about making servers completely independent of each other. It is the client (configured with a list of all servers) that connects all of the server’s data. A hash function resolves the keys for the server on each client. This refers to data consistency, even in the simplest cases where all clients need the same server lists. If different clients have different server lists or different hash functions, then all compatibility will be challenged. This also means that a test overlay is required to update the code and measure the data layer. There is no native support to get the most out of it without built-in support for copying, so any network partition or server crash can cause a loss of usability.

IMDG servers are fully clustered, always familiar to each other. They use several methods to establish a distributed consensus and ensure greater stability is guaranteed. The list of IMDG capabilities is extensive, but it is often believed that using Memcache will give you the highest level of performance. IMDGs support sorted caches, so frequently used data can be temporarily stored in client processes, which automatically synchronize with data in server clusters without network hops Provides superior performance.

What about NoSQL? What are the major differences?NoSQL solutions come in a variety of colors and flavors. Solutions such as Monotype, Cassandra, and Heaps all have the same horizontal measurement values ​​and are often deployed as an alternative to traditional databases. IMDG is set up as the primary utility event with the help of shared caching. All objects in the galaxy are rapidly evolving. IMDG (such as Gemfire) has been developed with sophisticated “non-shared” disk stability, and some in the NoSQL camp have improved the use of shared memory for greater performance. Therefore, there is increasing overlap in basic technology.

There are several changes – support for shared transactions, spread-add parallel query processing, sorted caching, support for published event processing, a framework for integrating data into existing databases, copying over a wide area of ​​the network. , IMDGs have an advantage. Most IMDGs are designed for pure semantic languages ​​such as Java, C #, and C ++, while NoSQL products such as Monotype make it very easy for developers to work with their APIs using JavaScript.

Information: According to a recent Gardner report, in-memory data grid technology enables new computing prototypes such as cloud, complex-event processing (CEP), and data analysis. What do you think is the role of IMDG in these computational prototypes?
In many cases, cloud deployment ensures scalability and increased usability. The amount of service will vary regardless of the requested spikes. So, from this point of view, IMDG is a great fit. When spikes occur, automatic detection and resource provisioning (H / W capability) is managed using virtualization (such as automatic rendering using vCloudDirector in a VMware environment) and integration with IMDG, today, IMDG without any Operator assistance may be extended or contracted.

Advanced IMDG provides CEP with a feature called “continuous querying” – subscribers subscribe to interest data using queries and numbers, and can synchronize “events” with customers when queries are affected due to IMDG updates. This feature enables a new breed of real-time, push-focus applications, where events can push thousands of devices running applications.

Application behavior can be parallelized and integrated into data. With the use of data in memory and processing power across the grid, more complex analysis can now be performed in a field than in traditional collection methods.

Info: What are some forms of polyglot stability supported by data grids in memory (eg write back) This framework allows the application developer to interface with data services, access files, RDBs, and more. It supports 3 main formats:

Read Missed – If missing objects are missing in the data phase, enable “Data Loader”.Transactions “write through” – Changes will be updated simultaneously in the backend repository and transaction mode. If there is an update to the repository in the backend, the update for IMDG will be successful. Asynchronous “rewrite” – search for updates to the grid and make changes to the backend repository in the module. The array can be configured to copy memory for HA.