Backup and replication of your DynamoDB tables

We are using Amazon’s DynamoDB (DDB) as part of our platform. As stated in the FAQ section, AWS itself replicates the data across three facilities (Availability Zones, AZs) within a given region, to automatically cope with an eventual outage of any of them. This is a relief, and useful as part of an out of the box solution, but you’d probably want to go beyond this setup, depending on what your high availability and disaster recovery requirements are.

I have recently done some research and POCs as to how it would be best to achieve a solution inline with our current setup. We needed it to:

  • be as cost effective as possible, while covering our needs
  • introduce the least possible complexity in terms of deployment and management
  • satisfy our current data backup needs and be inline with allowing us to handle high availability in the near future.

There’s definitely some good literature on the topic online (1), besides related AWS resources, but have decided to write a serious of posts which will hopefully provide a more practical view on the problem and the different range of possible solutions.

In terms of high availability, probably your safest bet would be to go with cross-region replication of your DDB tables. In a nutshell, this will allow you to create replica tables of your master ones in a different AWS region. Luckily, AWS labs provides an implementation on how to do this, open-sourced and hosted in GitHub. If you take a close look at the project’s README, you’ll notice it is implemented by using the Kinesis Client Library (KCL). It works by using DDB Streams, so for this to work, streaming of the DDB tables you want to replicate needs to be enabled, at least for the master ones (replicas don’t need it).

From what I’ve seen, there would be several ways of accomplising our data replication needs:

Using a CloudFormation template

Using a CloudFormation (CF) template to take care of setting up all the infrastructure you need to run their cross-replication implementation mentioned above. If you’re not very familiar with CF, they describe it as:

AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion.

Creating a stack with it is quite straight forward, and the wizard will allow you to configure the options on the following screenshot, besides some more advanced ones on the following screen for which you can use defaults in a basic setup.

Screen Shot 2016-07-25 at 09.38.36

Using this template will take care of creating everything from your IAM roles and Security Groups creation, to launching the defined EC2 instances to perform the job. One of those instances will take care of coordinating replication and the other(s) will take care of the actual replication process (i.e. running the KCL worker processes). The actual worker instances are implicitly defined as part of an autoscaling group, as to guarantee that the worker instances are always running, in order to prevent events from the DDB stream being unprocessed, which would lead to data loss.

I couldn’t fully test this method as after CF finished setting up everything, I couldn’t use the ReplicationConsoleURL to configure master/replica tables due to the AWS error below. Anyway, wanted a more fine grained control of the process, so looked into the next option.

Screen Shot 2016-07-25 at 11.14.46

Manually creating your AWS resources and running the replication process

This would basically imply performing most of what CF does on your behalf. So it would mean quite a bit more work in terms of infrastructure configuration, be it through the AWS console or as part of your standard environment deployment process.

I believe this would be a valid scenario in the case you want to use your existing AWS resources to run the worker processes. You’ll need to leverage what your costs restrictions and computing resource needs are, before finding considering this a valid approach. In our case, this would help us with both, so decided to explore it further.

Given that we already have EC2 resources set up as part of our deployment process, I decided to create a simple bash script that would kick of the replication process as part of our deployment. It basically takes care of installing the required OS dependencies, cloning the git repo and building it, and then executing the process. It requires 4 arguments to be provided (source region/table and target region/table). Obviously, it doesn’t perform any setup on your behalf, so the argument tables will need to exist on the specified regions, and the source table must have streaming enabled.

This proved to be a simple enough approach, and worked as expected. The only downside of it is that, regardless of it running within our existing EC2 fleet, we still needed to figure out a mechanism of monitoring the worker process, in order to restart it in case it dies for any reason, and avoid data loss as mentioned above. Definitely an approach we might end up using in the near future.

Using lambda to process the DDB stream events

This method uses the same approach as the above, in that it relies on events from your DDB tables streams, but removes the need of having to take care of the AWS compute resources you will need in doing so. You will still need to handle some infrastructure and write the lambda function that will perform the actual replication, but will definitely help with the cost and simplicity requirements mentioned in the introduction.

Will leave the details of this approach for the last post of this series though, as it is quite a broad topic that I will cover there in detail.

In the upcoming posts I will discuss the overall final solution we end up going with, but before getting to that, in my next post, I will discuss how to backup your DDB tables to S3.

Stay tuned!

 

 

 

 

Dependency resolution with Eclipse Aether

Most Java developers deal with dependency resolution only at build time – for example, they will declare dependencies in their Maven POM and Maven will download all the required dependencies during a build, caching them in a local repository so that it won’t have to download it again the next time. But what if you need to do this dependency resolution at run-time? How do you do that? It turns out to be rather straight-forward and it’s done using the same library that Maven uses internally (at least Maven 3.x).

Transitive Dependency Resolution at Run-time

That library is Aether (which was contributed to Eclipse by Sonatype). Doing basic transitive dependency resolution requires you to setup some Aether components – the pieces are readily available within 3 Aether Wiki pages:

  • Getting Aether (you don’t need all the dependencies listed there if you’re just doing basic resolution)
  • Setting Aether up (the code in thenewRepositorySystem() method) – IMPORTANT: For the custom TransporterFactory described in the Wiki page to work properly, you will have to add the TransporterFactorys before the BasicRepositoryConnectorFactory unlike in the Wiki
  • Creating a Repository System Session (the code in the newSession(...) method)

Now that you have the repositorySystem and a session, you can use the following code to get the full set of transitive dependencies for a particular artifact, given by it’s Maven coordinates:

private CollectRequest createCollectRequest(String groupId, String artifactId, String version, String extension) {
    Artifact targetArtifact = new DefaultArtifact(groupId, artifactId, extension, version);
    RemoteRepository centralRepository = new RemoteRepository.Builder("central", "default", "http://repo1.maven.org/maven2/").build();

    CollectRequest collectRequest = new CollectRequest();
    collectRequest.setRoot(new Dependency(targetArtifact, "compile"));
    collectRequest.addRepository(centralRepository);

    return collectRequest;
}

private List<Artifact> extractArtifactsFromResults(DependencyResult resolutionResult) {
    List<ArtifactResult> results = resolutionResult.getArtifactResults();
    ArrayList<Artifact> artifacts = new ArrayList<>(results.size());

    for (ArtifactResult result : results) {
        artifacts.add(result.getArtifact());
    }

    return artifacts;
}

public List<Artifact> resolve(String groupId, String artifactId, String version, String extension) throws DependencyResolutionException {
    CollectRequest collectRequest = createCollectRequest(groupId, artifactId, version, extension);

    DependencyResult resolutionResult = repositorySystem.resolveDependencies(session,
    new DependencyRequest(collectRequest, null));

    return extractArtifactsFromResults(resolutionResult);
}

That gets you the full set of transitive dependencies.

Customizing the Resolution

What we did so far is to get Aether to grab artifacts from Maven central (you will have noticed how we configured the CollectRequest with the centralRepository as the only one to consult. Adding other remote repositories would have been done in the same way as adding central. But let’s say we wanted to have a more direct say in how artifacts were retrieved. For example, maybe we want to get artifacts from an AWS S3 bucket, or perhaps we want to generate artifact content at run-time. In order to do that, we need to create a new Aether Transporter, and hook it into the repository system we setup above.

Let’s consider a basic implementation:

public class CustomTransporter extends AbstractTransporter {
 private static final Exception NOT_FOUND_EXCEPTION = new Exception("Not Found");
 private static final byte[] pomContent =
  ("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
   "<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n" + " xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n" + " xsi:schemaLocation=\"http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd\">\n" +
   " <modelVersion>4.0.0</modelVersion>\n" +
   " <groupId>custom.group</groupId>\n" +
   " <artifactId>custom-artifact</artifactId>\n" +
   " <version>1.0</version>\n" +
   " <packaging>jar</packaging>\n" +
   "</project>\n").getBytes();

 public CustomTransporter() {}

 @Override
 public int classify(Throwable error) {
  if (error == NOT_FOUND_EXCEPTION) {
   return ERROR_NOT_FOUND;
  }

  return ERROR_OTHER;
 }

 @Override
 protected void implClose() {}

 @Override
 protected void implGet(GetTask task) throws Exception {
  if (task.getLocation().toString().contains("custom/group/custom-artifact/1.0") &&
   task.getLocation().getPath().endsWith(".pom")) {
   utilGet(task, new ByteArrayInputStream(pomContent), true, -1, false);
   return;
  }

  throw NOT_FOUND_EXCEPTION;
 }

 @Override
 protected void implPeek(PeekTask task) throws Exception {
  if (task.getLocation().toString().contains("custom/group/custom-artifact/1.0") &&
   task.getLocation().getPath().endsWith(".pom")) {
   return;
  }

  throw NOT_FOUND_EXCEPTION;
 }

 @Override
 protected void implPut(PutTask task) throws Exception {
  throw new UnsupportedOperationException();
 }
}

Your transporter is going to be invoked with get, peek and put tasks by the Aether code. The main ones to worry about here are the get & peek requests. The peek task is designed to check if an artifact exists, and the get task is used to retrieve artifact content. The peek task should return without an exception if the artifact identified by the task exists, or throw an exception if the artifact doesn’t exist. The get task should return the artifact content (here we show how that’s done using the utilGet method) if the artifact exists, and throw an exception otherwise.

Note how the classify method is used to actually determine if the exception you throw in the other methods indicates that an artifact is non-existent. If you classify an exception thrown by the other methods as an ERROR_NOT_FOUND, Aether will consider that artifact as non-existent, while ERROR_OTHER will be treated as some other error.

Now that we have a transporter, hooking it up requires us to first create a TransporterFactory corresponding to it:

public class CustomTransporterFactory implements TransporterFactory, Service {
 private float priority = 5;

 public void initService(ServiceLocator locator) {}

 public float getPriority() {
  return priority;
 }

 public CustomTransporterFactory setPriority(float priority) {
  this.priority = priority;
  return this;
 }

 public Transporter newInstance(RepositorySystemSession session, RemoteRepository repository)
 throws NoTransporterException {
  return new CustomTransporter();
 }
}

Nothing to say about that – it’s pretty boilerplate. If you need to pass in some information to your transporter about it’s context, do it when you construct the transporter in the newInstance method here.

Finally, hook up the custom TransporterFactory the same way all the others are hooked up – that is, add it when we construct RepositorySystem:

locator.addService(TransporterFactory.class, CustomTransporterFactory.class);

IMPORTANT: Note again the TransporterFactory needs to be added before the BasicRepositoryConnectorFactory.

That’s it, now for all the artifacts, our transporter will also get to be involved in resolving artifacts!

Using TemplateRef to create a tooltip/popover directive in Angular 2

This post is about a Component created in the context of our application development. There is a demo here, and you can find the full source code here.

Lately, the need arose to create a tooltip directive. This brought up a lot of questions we hadn’t had to face before, such as how to create markup wrapping around rendered content, or rather “what is Angular 2’s transclude?”

Turns out, using TemplateRef is very useful for this, but the road to understanding it wasn’t easy. After seeing it used in a similar fashion by Ben Nadel, I decided to take a stab at it.

TemplateRef is used when using <template> elements, or perhaps most commonly when using *-prefixed directives such as NgFor of NgIf. For *-prefixed directives (or directives in <template> elements, TemplateRef can be injected straight into the constructor of the class. For other components, however, they can be queried via something like the ContentChild decorator.

Initially, I had thought to create two directives: a TooltipDirective to be placed on the parent element, plus a TooltipTemplate directive to be placed in a template, that would then inject itself into the parent. It proved too complex, though, and after finding what could be done with the ContentChild query the implementation became much simpler.

The end result looks like this (simplified for clarity):

@Directive({
    selector: "[tooltip]"
})
export class TooltipDirective implements OnInit {
    @Input("tooltip") private tooltipOptions: any;
    @ContentChild("tooltipTemplate") private tooltipTemplate: TemplateRef < Object > ;

    private tooltip: ComponentRef < Tooltip > ;
    private tooltipId: string;

    constructor(
        private viewContainer: ViewContainerRef,
        public elementRef: ElementRef,
        private componentResolver: ComponentResolver,
        private position: PositionService) {
        this.tooltipId = _.uniqueId("tooltip");
    }

    ngOnInit() {
        // Attach relevant events
    }

    private showTooltip() {
        if (this.tooltipTemplate) {
            this.componentResolver.resolveComponent(Tooltip)
                .then(factory => {
                    this.tooltip = this.viewContainer.createComponent(factory);
                    this.tooltip.instance["content"] = this.tooltipTemplate;
                    this.tooltip.instance["parentEl"] = this.elementRef;
                    this.tooltip.instance["tooltipOptions"] = this.options;
                });
        }
    }

    private hideTooltip() {
        this.tooltip.destroy();
        this.tooltip = undefined;
    }

    private get options(): TooltipOptions {
        return _.defaults({}, this.tooltipOptions || {}, defaultTooltipOptions);
    }
}

@Component({
    selector: "tooltip",
    template: `<div class="inner">
<template [ngTemplateOutlet]="content"></template>
</div>
<div class="arrow"></div>`
})
class Tooltip implements AfterViewInit {
    @Input() private content: TemplateRef < Object > ;
    @Input() private parentEl: ElementRef;
    @Input() private tooltipOptions: TooltipOptions;

    constructor(
        private positionService: PositionService,
        public elementRef: ElementRef) {}

    private position() {
        // top and left calculated and set
    }

    ngAfterViewInit(): void {
        this.position();
    }
}

The TooltipDirective requires a <template #tooltipTemplate> element, that gets rendered through a  Tooltip Component, created and injected with the templateRef containing our content. Essentially, “transcluding” it. The Tooltip component’s role is only to wrap the content with some light markup, and position itself when inserted into the page.

A lot of the actual positioning (not shown here, but in the source code) is done directly to the rendered elements, though – I faced some issued when using the host properties object, that I believe were reintroduced in the latest RC.

All in all, it was a great learning experience, and Angular 2’s <template> surely beats Angular.js’ transclude. Slowly but surely Angular 2 get’s more and more demystified to me, but it is hard work getting there.