Welcome to Oscar

Quickly turn paper documents into digital data. Oscar provides data digitisation with built-in quality control.

Quick as a flash

Data you can trust

Edit with ease

See Oscar in action

Welcome to EM

Our effective environment management solution combines our people, processes and technology to maximise results from your investment.

Full review of your
environment

A plan to address
these systematically

See improvements
immediately

Case studies

Remember to stretch

Stretching is an important part of any keep fit regime, but what about when it comes to deploying enterprise analytics platforms? 

This blog post introduces the concept of a “Stretch Cluster” for running your analytics workloads. 

Use this approach to: 

Faced with increasing scrutiny from regulators and growing dependency from core business units, Enterprise Analytics platforms are often early in line to be migrated to the cloud to leverage its many benefits. However, under certain circumstances, making the move to someone else’s datacentre isn’t palatable or (easily) possible. 

Possible reasons preventing migration to the cloud:

Staying On-Premise 

If cloud deployment is not feasible – how can we leverage technology to bring cloud-like capabilities to an on-premise analytics deployment?  

The table below describes desirable features, their purpose and examples of the technologies which provide them:

Feature Purpose Example 
Synchronous storage replication Replication of data between 2 (or more) sites Hitachi GAD Spectrum Scale Replication 
Clustered filesystem High performance parallel access (reads and writes) to data across multiple nodes/machines IBM Spectrum Scale (GPFS) 
Virtualisation High availability, efficiency, scaling VMWare 
Workload management Control and placement of cluster and user jobs/processes Spectrum LSF 
Service orchestration Control and placement of cluster and services LSF EGO 

Storage is the key technology which underpins a stretch cluster and enables an active/active Stretch Cluster configuration.

Active/Active Stretch Cluster 

A stretch cluster is a single entity, “stretched” between 2 geographically separate sites. In cloud parlance, this would be 2 availability zones. In terms of deployment, configuration, administration and licensing, a stretch cluster is a single deployment. A combination of technologies makes this configuration possible – however the most import is: synchronous storage replication.

Storage volumes are configured on a SAN at each site and configured into a “GAD pair” which enables synchronous replication.

Each volume pair is presented to each of the servers over multiple fibre channel connections and aggregated using multipathd

Spectrum Scale NSDs are created by mapping each multipath device to an NSD which are grouped together to form a GPFS filesystem

Local path optimisation ensures reads/writes are performed from/to the closest SAN (during normal operation)

In the event of path/SAN failure – the secondary/backup path is used. This incurs a performance penalty but ensures continuity of service.

Orchestration ensures placement of jobs/processes to an appropriate node in the cluster.

Nodes can be closed to allow rolling maintenance/upgrades without interuption of service across the cluster

Hitachi LUN snapshotting enables quick recovery of storage volumes to help reduce outage windows (RTO) in the event of issues.

Synchronous Data Replication 

To ensure data availability across 2 sites – some form of synchronous data replication is required, this may also be referred to as data mirroring. Each vendor will have their own technologies/approaches to this, the essential feature is that the technology must ensure synchronous replication/mirroring; there cannot be any lag or delay in the replication (otherwise known as asynchronous) otherwise the cluster cannot run in an active/active configuration and in effect all you have is an active/passive setup. Here we discuss the 2 synchronous data replication/mirroring options. Technologies available will depend on your estate and the distance between each location the stretch cluster is configured across.

Storage Based Replication: Hitachi GAD 

The storage sub system is responsible for handling the replication using dedicated, redundant high bandwidth cross site links. Using GAD, a virtual storage machine is configured in the primary and secondary storage systems using the actual information of the primary storage system, and the global-active device primary and secondary volumes are assigned the same virtual LDEV number in the virtual storage machine. This enables the host to see the pair volumes as a single volume on a single storage system, and both volumes receive the same data from the host. Local path optimisation ensures that I/O is handled on the SAN local to the node during normal operation. In the event of path or storage array failure, the (secondary) remote path is used which incurs a performance penalty due to the remote writes, however it ensures service continuity and therefore the ability to sustain uptime.

Hitachi Global-active Device (GAD) enables synchronous remote copies of data volumes which provide the following benefits:

An example GAD enabled configuration is below. A third, geographically separate quorum site is required to act as a heartbeat for the GAD pair. Communication failure between systems results in a series of checks with the quorum disk to identify the problem.

GPFS replication 

By configuring data volumes into 2 failure groups (1 per site) and enabling 2 data replicas – GPFS is responsible for ensuring blocks are replicated between sites. Nodes at each site are connected to a site-specific SAN and GPFS assigns block replicas to the distinct failure groups (i.e. the data is copied/mirrored to a distinct set of devices at each site) – this ensures nodes at the primary and secondary sites see the same data. The downside to using this method is the replication is managed and processed by the server nodes, adding overhead. Replication is (usually) carried out over the LAN which incurs performance penalties vs using storage-based replication which uses a dedicated fabric. Data replication also doubles (or triples) the data footprint managed by GPFS – which increases licensing costs significantly due to GPFS running under a capacity based license.

Whichever replication option is selected, what this means in simple terms, is that your data is available at 2 sites simultaneously, allowing services and workload to migrate/move between nodes while retaining access to the same data. Workload can also be run in parallel across multiple nodes regardless of location.