Running Linux stuff on Windows in 2017

Back in the day, as a developer who worked on Windows, I used to dread having to build projects that were built with Makefiles, and had Unix shell scripts in the build process. It meant having to install Cygwin or MinGW in order to get GNU tools that worked on Windows and then run the build, and hope it all magically worked.

Today, MinGW and Cygwin remain options for building Linux projects, running Bash scripts, and just using various Linux tools on Windows.

Windows Subsystem for Linux

But there are better options. If you are on Windows 10, since the Fall Creators Update, the Windows Subsystem for Linux means you can get a pretty full-fledged Linux environment on Windows. Follow the guide here to get it setup on your Windows 10 machine. From within the Linux environment, you have access to all the files on your machine, and you can get most Linux tools in the environment so that you can build projects, run Bash scripts, etc. And you’re not running a VM with this option.

VMs and Docker

Of course, a VM running Linux has always been an option but things have gotten a lot better on this front in 2017. Hardware-assisted virtualization is fairly common, and means VMs run just as fast. Docker has improved things  quite a bit in managing a VM on Windows so that you may not even realize you’re running on a VM. If you are on some versions of Windows 10, Docker for Windows will use the HyperV technology built-in to Windows to run the VM so that you don’t need an external hypervisor like VirtualBox. And if you’re on Windows Home (which doesn’t have HyperV support), you can use Docker Toolbox, which takes care of installing VirtualBox for you.

With Docker, your access to the VM, and your experience of running Linux on Windows turns into running a sandboxed container. You can run the Linux scripts, build the projects with Makefiles, and do it all within a container that you can simply delete after you’re done.

Sandbox

Finally, there’s Sandbox. It’s Docker but more convenient. It’s a single command-line binary that will setup Docker for you, before it runs Linux containers. It’s small so that it can be checked-in to your source control repo. That means if you’re in a team with other developers that like writing Linux scripts, you’re not going to have to translate them to Windows batch scripts. You can simply put Sandbox into the repo, and when you check out the repo, you can run those Linux scripts directly.

A quick example to show how it’ll look when Windows developers have to deal with Linux developers. Imagine you’re a Windows developer on a team working on the following C project which uses a Makefile:

https://github.com/stackfoundation/sbox-makefiles

Go ahead and clone the project repo, and just run the following to build the application:

git clone https://github.com/stackfoundation/sbox-makefiles
cd sbox-makefiles
./sbox run build-app

Or run the following to build and run the application:

./sbox run run-app

Or the following to run a bash script:

./sbox run hello-script

That’s how simple it can be to run Linux stuff on Windows in 2017.

 

Fully automating the daily developer humdrum

In the beginning, and every day, the toil

Do you remember those first few days when you joined a new software project? Perhaps it was at your current job? Or maybe when you started contributing to an open source project? Or maybe it was when you decided to get your hands dirty with the latest fad framework?

Even in 2017, the place you often start is by going through a list of step-by-step instructions in a wiki, a README, or given to you by a fellow developer. Usually, you need to install this, then install that, put this file here, and run this script there.  It’s pretty tedious, error-prone, and often requires improvisation to account for things changing.

And of course, it’s not just at the start of a project. There are tedious tasks that you have to perform daily as a developer. Building your application, running unit tests, running functional tests, setting up seed data, creating certificates, generating versioned packages, setting up infrastructure, deploying your application, the list goes on. Continue reading “Fully automating the daily developer humdrum”

Announcing Sandbox

Announcing Sandbox

We’re very proud to announce that today, we’re ready to show off what we’ve been working on for the past few months at StackFoundation: a developer tool called Sandbox. Sandbox is a tool for running Docker-based workflows which reliably automate your day-to-day development chores. Whether it’s building your application, running your automated tests, deploying your application, or any other mundane task you perform regularly, Sandbox can help you create new scripted workflows, or make your existing scripts for these tasks more reliable. If you’re a developer, we built Sandbox to be useful for you everyday.

A Pivot

We built Sandbox to address our own needs as developers. We built a tool for ourselves, which is why we think it would be very useful for other developers. But we’ve been wrong before about what others might find useful. For anyone that has seen StackFoundation’s earlier days, and our earlier work, this is a shift in product for us. We spent a bit over a year on our first product, and we were wrong about it’s appeal. Which is why this time we are releasing early.

It’s in Alpha

We’re releasing early, and we’re releasing while Sandbox is still a bit rough around the edges. We are doing this because we want to hear from developers, and we need help from developers.

If you are developer, we would love it if you could try running a simple workflow with Sandbox, and tell us how it goes. There are a lot of platform-specific nuances that we need help in getting right – if Sandbox fails to run on your machine, we especially want to hear from you. But we want to hear from you even if it all works as it was designed to. Leave a comment on the blog, tweet at us, file an issue on our GitHub issue tracker, find us on Gitter, or drop us an email. We want to hear what you think.

And just for anyone wondering: Sandbox is free, not just now in Alpha, but also forever in the future.

10 tips for migrating from Maven to Gradle

Here’s a quick list of 10 lessons we learned when making the switch to Gradle from Maven for StackFoundation. Coming from a deeply Maven place, these are the things that gave us an “Aha! That’s how you do it in Gradle!” moment. As with any other tool, these are not the only way (nor the best) to do some of these things with Gradle – this is not meant to be a prescriptive list of best practices. Rather, it’s just a few things to help those Mavenistas out there who are thinking of switching to Gradle, or actively switching to Gradle, and figuring out how to get their mind to think Gradle.

1) Forget the GAV, use the directory layout!

Maven folks are used to thinking about a module’s GAV – it’s group, artifact, and version. When you switch to Gradle, you don’t have to think about this so much. Gradle will name projects based on the names of the directories by default. So if you have the following multi-project directory structure:

  • server
    • core
      • src/main/java
    • logging
      • src/main/java

These projects are named server, core, and logging. In Gradle, projects are identified with a fully-qualified path – in this case the paths are going to be  :server,  :server:core and  :server:logging.

Note: You can give projects a group, and version, if you want.

2) Build everything from the root!

More than a GAV, when you start using Gradle regularly, you’ll start thinking of projects by their path.

In Maven, you’re probably used to switching to a particular sub-module directory and then invoking mvn clean install, etc from there. In Gradle, you’re going to kick off all builds from the root of your multi-project setup. And you can simply use a sub-project’s path to kick off a task for that project. For example, you can invoke gradlew :server:logging:buildto build the logging sub-project, within the top-level server project.

3) Use custom tasks!

In Maven, if you need to perform some custom logic for a build or deploy, you go hunting for a particular plug-in, and then see if invoking one of it’s goals at some spot within the fixed Maven build lifecycle accomplishes what you want. If not, you try and look for another plug-in. And another, and then you might try writing one yourself.

Gradle is fundamentally built around tasks. You’re going to end up writing a custom task for a lot of what you want to do. Build a package by combining things in a specific way? Write a task. Deploy a service? Write a task. Setup infrastructure? Write a task. And remember all Gradle scripts are Groovy scripts so you are writing Groovy code when writing your tasks. Most of the time, you won’t write a task from scratch – you’ll start with a plug-in (yes, like in Maven, you’ll start by searching for plug-ins), and one of the tasks it defines, and then customize it!

4) Name your tasks, give them a group and description!

If you have a complex Maven project, you are very likely using a number of profiles, and you will probably have a specific order to build things, and maybe even a specific order to run things with different profiles activated. You’ll end up documenting this on a Wiki or a README file in your Git repo. And then you’ll forget to update that document so that eventually, how exactly something is built is only tribal knowledge.

In Gradle, you create custom tasks. This was already point 3 – but once you create them, you can give these tasks a group and a description. We give our most important custom tasks the group name ‘StackFoundation’. That way when we run a gradlew tasks, we see a list of tasks specific to our project in the list of available tasks to run. A great way to document our tasks.

5) Alias tasks, name them something you will remember!

Picking up from 3 and 4: You can create a task just to alias another task defined by another plugin. For example – the Shadow plugin is the Gradle version of the Maven Shade plugin. You might be happy with the default shadowJar task it provides but if in your project, a more meaningful name for creating that shadow JAR package is createServicePackage, you can create an alias:

task createServicePackage(dependsOn: shadowJar)

Note: It’s not exactly an alias, but close enough.

6) The Shadow plugin is the Gradle version of the Maven Shade plugin

This one is used by enough Maven folks that it’s worth repeating.

7) Use the Gradle wrapper

With Maven, you have to get everyone to setup Maven or use an IDE which comes with a Maven built-in in order to run builds for your project. With Gradle, there’s the Gradle wrapper – and you’re meant to check it in to your team’s repo. Setup your project to use the wrapper, and put it in your source control repo! Your team won’t have to think about getting Gradle.

8) Forget the inheritance parent, use external build scripts to define common tasks

In Maven, you use an inheritance parent to manage dependencies, and plugins.

With Gradle, you can reference other Gradle files from a build.gradle file – you do that using something that looks like: apply from: '../../gradle/services.gradle'. These are called external build scripts and there’s some caveats to using them but they’re a great way to define common tasks. For example, you can create some common tasks for deploying any of the services you use in your projects inside gradle/services.gradle and reference them from your other Gradle files.

Note: You can also put common task code inside buildSrc.

9) Forget the inheritance parent, create custom libraries

In Maven, you use a parent POM to define common dependencies. With Gradle, you can define common dependencies by putting them in an external build script (described in point 6). Here’s an example of a file in gradle/dependencies.gradle which defines some common libraries we use in all of our projects:

repositories {
    mavenLocal()
    mavenCentral()
}

ext {
    libraries = [
            aws            : {
                it.compile('com.amazonaws:aws-java-sdk-s3:1.11.28') {
                    exclude group: 'org.apache.httpcomponents', module: 'httpclient'
                    exclude group: 'com.fasterxml.jackson.core', module: 'jackson-annotations'
                    exclude group: 'com.fasterxml.jackson.core', module: 'jackson-core'
                    exclude group: 'com.fasterxml.jackson.core', module: 'jackson-databind'
                    exclude group: 'com.fasterxml.jackson.dataformat', module: 'jackson-dataformat-cbor'
                }
            },
            awsEcr         : 'com.amazonaws:aws-java-sdk-ecr:1.11.28',
            datamill       : {
                it.compile('foundation.stack.datamill:core:0.1.1-SNAPSHOT') {
                    exclude group: 'org.apache.httpcomponents', module: 'httpclient'
                }
            },
            datamillLambda : 'foundation.stack.datamill:lambda-api:0.1.1-SNAPSHOT',
            junit5 : [
                'org.junit.jupiter:junit-jupiter-api:5.0.0-M4',
                'org.junit.jupiter:junit-jupiter-migration-support:5.0.0-M4'
            ],
    ]
}

Note the use of GAVs to refer to Maven dependencies, and how you can setup exclusions using this approach. With this approach, we get to give our own names to these libraries instead of referring to everything with GAVs. This is especially great for us because colloquially, we refer to our dependencies using these names and this makes looking at project dependency information clear and concise. In addition, we can group multiple Maven dependencies into one custom user library, as with the junit5 example.

Here’s how a particular project defines the libraries as dependencies:

dependencies {
    compile libraries.datamill(it)
    testCompile libraries.junit5
}

10) Doing resource filtering

In Maven, you probably use resource filtering to replace property placeholders in resource files. There’s two equivalents in Gradle – the first is to use ReplaceTokens:

processResources {
    def props = [imageVersion: 'unspecified']
    filesMatching('*.properties') {
        filter(org.apache.tools.ant.filters.ReplaceTokens, tokens: props)
    }
}

This looks for placeholders of the form @imageVersion@, i.e., they’re delimited by @’s. It tolerates missing property names. A second form looks like this:

processResources {
    props = [imageVersion: 'unspecified']
    filesMatching("**/*.yaml") {
        expand props
    }
}

This looks for property placeholders of the form $imageVersion – well, sort of. It’s actually using a template mechanism in Groovy which makes it very powerful but if you use it for simple cases, you’ll probably encounter the following: if a simple placeholder references a missing property, your build will fail in error!

That’s all for now! More lessons from our experience migrating to Gradle at StackFoundation will be for a future post! Hope that helps those of you making the switch from Maven!

 

Dependency resolution with Eclipse Aether

Most Java developers deal with dependency resolution only at build time – for example, they will declare dependencies in their Maven POM and Maven will download all the required dependencies during a build, caching them in a local repository so that it won’t have to download it again the next time. But what if you need to do this dependency resolution at run-time? How do you do that? It turns out to be rather straight-forward and it’s done using the same library that Maven uses internally (at least Maven 3.x).

Transitive Dependency Resolution at Run-time

That library is Aether (which was contributed to Eclipse by Sonatype). Doing basic transitive dependency resolution requires you to setup some Aether components – the pieces are readily available within 3 Aether Wiki pages:

  • Getting Aether (you don’t need all the dependencies listed there if you’re just doing basic resolution)
  • Setting Aether up (the code in thenewRepositorySystem() method) – IMPORTANT: For the custom TransporterFactory described in the Wiki page to work properly, you will have to add the TransporterFactorys before the BasicRepositoryConnectorFactory unlike in the Wiki
  • Creating a Repository System Session (the code in the newSession(...) method)

Now that you have the repositorySystem and a session, you can use the following code to get the full set of transitive dependencies for a particular artifact, given by it’s Maven coordinates:

private CollectRequest createCollectRequest(String groupId, String artifactId, String version, String extension) {
    Artifact targetArtifact = new DefaultArtifact(groupId, artifactId, extension, version);
    RemoteRepository centralRepository = new RemoteRepository.Builder("central", "default", "http://repo1.maven.org/maven2/").build();

    CollectRequest collectRequest = new CollectRequest();
    collectRequest.setRoot(new Dependency(targetArtifact, "compile"));
    collectRequest.addRepository(centralRepository);

    return collectRequest;
}

private List<Artifact> extractArtifactsFromResults(DependencyResult resolutionResult) {
    List<ArtifactResult> results = resolutionResult.getArtifactResults();
    ArrayList<Artifact> artifacts = new ArrayList<>(results.size());

    for (ArtifactResult result : results) {
        artifacts.add(result.getArtifact());
    }

    return artifacts;
}

public List<Artifact> resolve(String groupId, String artifactId, String version, String extension) throws DependencyResolutionException {
    CollectRequest collectRequest = createCollectRequest(groupId, artifactId, version, extension);

    DependencyResult resolutionResult = repositorySystem.resolveDependencies(session,
    new DependencyRequest(collectRequest, null));

    return extractArtifactsFromResults(resolutionResult);
}

That gets you the full set of transitive dependencies.

Customizing the Resolution

What we did so far is to get Aether to grab artifacts from Maven central (you will have noticed how we configured the CollectRequest with the centralRepository as the only one to consult. Adding other remote repositories would have been done in the same way as adding central. But let’s say we wanted to have a more direct say in how artifacts were retrieved. For example, maybe we want to get artifacts from an AWS S3 bucket, or perhaps we want to generate artifact content at run-time. In order to do that, we need to create a new Aether Transporter, and hook it into the repository system we setup above.

Let’s consider a basic implementation:

public class CustomTransporter extends AbstractTransporter {
 private static final Exception NOT_FOUND_EXCEPTION = new Exception("Not Found");
 private static final byte[] pomContent =
  ("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
   "<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n" + " xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n" + " xsi:schemaLocation=\"http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd\">\n" +
   " <modelVersion>4.0.0</modelVersion>\n" +
   " <groupId>custom.group</groupId>\n" +
   " <artifactId>custom-artifact</artifactId>\n" +
   " <version>1.0</version>\n" +
   " <packaging>jar</packaging>\n" +
   "</project>\n").getBytes();

 public CustomTransporter() {}

 @Override
 public int classify(Throwable error) {
  if (error == NOT_FOUND_EXCEPTION) {
   return ERROR_NOT_FOUND;
  }

  return ERROR_OTHER;
 }

 @Override
 protected void implClose() {}

 @Override
 protected void implGet(GetTask task) throws Exception {
  if (task.getLocation().toString().contains("custom/group/custom-artifact/1.0") &&
   task.getLocation().getPath().endsWith(".pom")) {
   utilGet(task, new ByteArrayInputStream(pomContent), true, -1, false);
   return;
  }

  throw NOT_FOUND_EXCEPTION;
 }

 @Override
 protected void implPeek(PeekTask task) throws Exception {
  if (task.getLocation().toString().contains("custom/group/custom-artifact/1.0") &&
   task.getLocation().getPath().endsWith(".pom")) {
   return;
  }

  throw NOT_FOUND_EXCEPTION;
 }

 @Override
 protected void implPut(PutTask task) throws Exception {
  throw new UnsupportedOperationException();
 }
}

Your transporter is going to be invoked with get, peek and put tasks by the Aether code. The main ones to worry about here are the get & peek requests. The peek task is designed to check if an artifact exists, and the get task is used to retrieve artifact content. The peek task should return without an exception if the artifact identified by the task exists, or throw an exception if the artifact doesn’t exist. The get task should return the artifact content (here we show how that’s done using the utilGet method) if the artifact exists, and throw an exception otherwise.

Note how the classify method is used to actually determine if the exception you throw in the other methods indicates that an artifact is non-existent. If you classify an exception thrown by the other methods as an ERROR_NOT_FOUND, Aether will consider that artifact as non-existent, while ERROR_OTHER will be treated as some other error.

Now that we have a transporter, hooking it up requires us to first create a TransporterFactory corresponding to it:

public class CustomTransporterFactory implements TransporterFactory, Service {
 private float priority = 5;

 public void initService(ServiceLocator locator) {}

 public float getPriority() {
  return priority;
 }

 public CustomTransporterFactory setPriority(float priority) {
  this.priority = priority;
  return this;
 }

 public Transporter newInstance(RepositorySystemSession session, RemoteRepository repository)
 throws NoTransporterException {
  return new CustomTransporter();
 }
}

Nothing to say about that – it’s pretty boilerplate. If you need to pass in some information to your transporter about it’s context, do it when you construct the transporter in the newInstance method here.

Finally, hook up the custom TransporterFactory the same way all the others are hooked up – that is, add it when we construct RepositorySystem:

locator.addService(TransporterFactory.class, CustomTransporterFactory.class);

IMPORTANT: Note again the TransporterFactory needs to be added before the BasicRepositoryConnectorFactory.

That’s it, now for all the artifacts, our transporter will also get to be involved in resolving artifacts!

3 Docker tips & tricks

Over the past few months, we’ve done a lot of development with Docker. There are a few things that we end up using over and over. I wanted to share three of these with other developers working with Docker:

  1. Remove all containers – Inevitably, during development you’re going to pile up a bunch of stale containers that are just lying around – or maybe you have a bunch of running containers that you don’t use. We end up needing to wipe out all the containers to start fresh all the time. Here’s how we do it:

    docker ps -a -q | awk '{print $1}' | xargs --no-run-if-empty docker rm -f


    It’s pretty self explanatory – it lists all the containers, and then removes each one by it’s ID. There are several incarnations of this but this one has the advantage that it can be used in Windows as well if you install UNIX command line tools (you could do that by grabbing MinGW for example). Alternatively, on Windows, you can use:FOR /f "tokens=*" %i IN ('docker ps -a -q') DO docker rm -f %i
  2. Mount the Docker Unix socket as a volume – OK, the way we use Docker is a bit more advanced than the standard use cases but it’s crazy how often we end up using this one. That’s because we always end up having to create Docker containers from within a Docker container. And the best way to do this is to mount the Docker daemon’s Unix socket on the host machine as a volume at the same location within the container. That means, you add the following when performing a docker run: -v /var/run/docker.sock:/var/run/docker.sock. Now, within the container, if you have a Docker client (whether that’s the command line one, or a Java one for example) connect to that Unix socket, it actually talks to the Docker daemon on the host. That means if you create a container from within the container with the volume, the new container is created using the daemon running on the host (meaning it will be a sibling of the container with the volume)! Very useful!
  3. Consider Terraform as an alternative to composeTerraform is for setting up infrastructure really easily and it’s great for that. For us, infrastructure means AWS when running in the cloud, and Docker when running locally. We have several containers that we have to run for our application – during development, we run all the containers locally, and in the cloud, we run the containers across various EC2 instances, each instance getting one or more containers. This is perfect for Terraform. We can use the Docker provider alone to configure resources to run our local setup, and we can use it together with the AWS provider to run our cloud setup. Note again that Terraform is for infrastructure setup, so you are doing things at a very high level – you may find that you need to do some prep using other tools to be able to work with Terraform. For example, you can’t use Dockerfiles – you will have to build your custom images prior to using them with Terraform.

 

Abusing Cucumber, for a good cause

In several Java houses I worked with in the past, we used Cucumber to do Behavior Driven Design. No hang on a sec – that’s definitely an exaggeration. I think it’s more accurate to say we used Cucumber as a way to write acceptance tests. Wait, that’s still an exaggeration. We used Cucumber to write a mix of integration tests, and what may generously be called functional tests (and very occasionally bordering on acceptance tests). Yeah, that’s about right. We used it as a tool to write tests in plain English. But you know what? I think that’s OK.

Cucumberistas, BDDers and DDDers will tell you it’s about everyone – business, QA and development – coming together to come up with executable specifications.  It’s about everyone speaking in a universal language – a language that the business analysts can share with the testers, and the developers. A language about the business problems an application is designed to solve. And a language for automated acceptance tests. Well maybe, just maybe, you are in an organization where that’s true. Where your Cucumber tests describe your user stories or specifications in the domain language for your application. If you are, good for you. You’re doing it “right”.

But for everyone else, I want to talk about some work we did to support your Cucumber test-writing efforts in the “wrong” way. And we don’t want to scold you, or admonish you for doing it “wrong”. No, in fact, we want to support you in your efforts to just write tests for HTTP services in English.

What I am talking about is best illustrated with an example – here’s how we use Cucumber to write tests for our application:

Background:
    Given the user stores http: //localhost:9080 as apiRoot

Scenario: Successful registration flow
    Given a random alphanumeric string is stored as testUserName
    And a user makes a POST call to "{apiRoot}/users"
    with payload:
    """ {
        "email": "{testUserName}@gmail.com",
        "password": "pass",
        "userName": "{testUserName}",
        "name": "Test User",
        "location": "London"
    }
    """
    Then the user should get a 200 response and JSON matching:
    """ 
    {
        "email": "{testUserName}@gmail.com",
        "userName": "{testUserName}",
        "name": "Test User",
        "location": "London",
        "id": "*"
    }
    """
    And the email containing subject Activate your account for {testUserName}@gmail.com is stored as activationEmail
    And the first link in stored HTML activationEmail is stored as activationLink
    And the regex activations / (\w + ) is used on stored value activationLink to capture activationToken
    When a user makes a POST call to "{apiRoot}/tokens/activation/{activationToken}"
    Then the user should get a 200 response
    Given the user "{testUserName}@gmail.com"
    is logged in with password "pass"
    on "{apiRoot}"
    When a user makes a GET call to "{apiRoot}/widgets/{testUserName}"
    Then the user should get a 200 response and JSON matching:
    """ 
    []
    """

Yes, what we have here is a functional test for one of our stories. But all the steps are essentially an English version of what a HTTP client would do when hitting the service. A business analyst probably wouldn’t want to read that but that’s really OK for us – business analysts in our experience don’t read the tests. Developers and testers read our tests, and it’s a great English language description of what the test does. I don’t need to click through the code behind the step definitions to know what’s going on. As a developer, I can understand right away what is being done.

So if you are OK with writing tests this way, check out the cucumber module we created as part of datamill. It has all the step definitions you see in the example above. If you are writing HTTP services, especially those that serve JSON, and are backed by a relational database, you will find it useful. Oh, and we threw in some useful step definitions for dealing with emails too because we needed them.

I want to end by admitting the following about this approach: Yes, sometimes this can get to be repetitive and a lot of copy-pasting. So, I will leave you with a last example of a custom step definiton we created that combines the utility ones above:

import cucumber.api.java.en.Given;
import foundation.stack.datamill.cucumber.DatabaseSteps;
import foundation.stack.datamill.cucumber.HttpSteps;
import foundation.stack.datamill.cucumber.PropertySteps;
import foundation.stack.datamill.http.Method;

public class UserSteps {
 private final DatabaseSteps databaseSteps;
 private final HttpSteps httpSteps;
 private final PropertySteps propertySteps;

 public UserSteps(PropertySteps propertySteps, DatabaseSteps databaseSteps, HttpSteps httpSteps) {
 this.propertySteps = propertySteps;
 this.databaseSteps = databaseSteps;
 this.httpSteps = httpSteps;
 }

 @Given("^the user \"(.+)\" is logged in with password \"(.+)\" on \"(.+)\"$")
 public void loginAsUser(String email, String password, String apiRoot) {
 httpSteps.userMakesCallWithProvidedPayload(Method.POST, apiRoot + "/tokens", "{" +
 "\"email\": \"" + email + "\"," +
 "\"password\": \"" + password +
 "}");
 httpSteps.assertStatusAndNonEmptyResponse(200);
 httpSteps.storeResponse("JWT");
 httpSteps.addValueToHeader("Authorization", "{JWT}");
 }
}

Checkout datamill, and the cucumber module!

Your own identity on the Internet

If you ever thought about having users login to your site, you’ve probably considered adding Facebook Login, or OAuth2 and OpenID Connect. And for good reason – they’re widely used.

An identity you own, to sign your content

But what if you wanted to allow users to own their identity? What would that look like? For a lot of technical folks, establishing identity usually means using a private key. Establishing identity using a private key also has the advantage that the user owns their own identity.

Let’s say that you establish your identity using your own private key. Any content you create can then be signed by you using your private key. Anyone can verify that it was you who created the content if they have your public key.

How does someone looking at a signed piece of content know what key was used to sign it? Well, you can publish your public key somewhere, and put a URL to that key next to the signature on the content you create. The URL would allow the reader to download the public key they need to verify the signature.

Mirrors

But what if the URL to the public key goes down? Well, we can setup mirrors for public keys (you might use alternatives such as key servers here). Users should be able to notify mirrors of a new public key that they’ve published. Sites hosting content can also send cached versions of public keys (that they picked up from the original source, or from a mirror) included in the content.

Claims, and verified claims

So far, we only have the ability to establish that some piece of content was created by someone owning a certain key. But we have not established who the person behind the key is as of yet. How can we do that? Well, let’s say that with every key, you can have a set of claims – metadata attributes associated with them. So for example, we can say some key key1 belongs to some user claiming that their fullName is Joe Blogs, and that their facebookProfile is http://facebook.com/joeblogs (fullName and facebookProfile are claims here). Great, so now we can say that wherever we see content signed with key key1, it belongs to Joe Blogs, whose Facebook profile is at http://facebook.com/joeblogs.

Of course, the obvious problem with this is that anyone can publish their key, and associate it with a bogus set of claims. What we need is a way to have verified claims. For example, we would especially want to verify that someone who claims to own a particular Facebook profile actually owns that profile. How do we do that? Well we can have a service that provides verified facebookProfile claims. That is, a service that uses Facebook Login to allow the owner of a key to login to their Facebook account to prove ownership, and only then confirm that the owner of that key owns a Facebook account.

Here is how that flow might work:

  1. The owner of the key signs a facebookProfile claim with their private key – let’s call the signature they produce here claimSignature
  2. They provide claimSignature to the Facebook verification service, which should first check that the provided claimSignature is correct and was produced by the owner of the key
  3. It should then have them login to the Facebook profile they claim to own using Facebook Login
  4. Once the service has verified that they own the Facebook account, the service would then sign claimSignature, with it’s own private key to create a verifiedClaimSignature

Now, if we were given the claimSignature, and the verifiedClaimSignature, together with the facebookProfile claim, we can trust that association a bit more. We would need to decide to trust that the Facebook verification service we used is a trust-worthy service in evaluating facebookProfile claims. If we do, all we need is the public key for that service to verify the verifiedClaimSignature and confirm that the facebookProfile provided can be trusted.

Decentralized identity

What does this allow at the end of the day? Suppose you wrote a blog post, or posted a comment somewhere on the web. You can now sign that content, and someone reading the content would be able to know that it was you who wrote it. And they would be able to know that based on the identity you own – your personal private key. Everyone can own their own identity.

A functional reactive alternative to Spring

Modern-day Spring allows you to be pretty concise. You can get an elaborate web service up and running using very little code. But when you write idiomatic Spring, you find yourself strewing your code with lots of magic annotations, whose function and behavior are hidden within complex framework code and documentation. When you want to stray away slightly from what the magic annotations allow, you suddenly hit a wall: you start debugging through hundreds of lines of framework code to figure out what it’s doing, and how you can convince the framework to do what you want instead.

datamill is a Java web framework that is a reaction to that approach. Unlike other modern Java frameworks, it makes the flow and manipulation of data through your application highly visible. How does it do that? It uses a functional reactive style built on RxJava. This allows you to be explicit about how data flows through your application, and how to modify that data as it does. At the same time, if you use Java 8 lambdas (datamill and RxJava are intended to be used with lambdas), you can still keep your code concise and simple.

Let’s take a look at some datamill code to illustrate the difference:

public static void main(String[] args) {
 OutlineBuilder outlineBuilder = new OutlineBuilder();

 Server server = new Server(
  rb -> rb.ifMethodAndUriMatch(Method.GET, "/status", r -> r.respond(b -> b.ok()))
  .elseIfMatchesBeanMethod(outlineBuilder.wrap(new TokenController()))
  .elseIfMatchesBeanMethod(outlineBuilder.wrap(new UserController()))
  .orElse(r -> r.respond(b -> b.notFound())),
  (request, throwable) -> handleException(throwable));

 server.listen(8081);
}

 

A few important things to note:

  • datamill applications are primarily intended to be started as standalone Java applications – you explicitly create the HTTP server, specify how requests are handled, and have the server start listening on a port. Unlike traditional JEE deployments where you have to worry about configuring a servlet container or an application server, you have control of when the server itself is started. This also makes creating a Docker container for your server dead simple. Package up an executable JAR using Maven and stick it in a standard Java container.
  • When a HTTP request arrives at your server, it is obvious how it flows through your application. The line[code language=”java”]rb.ifMethodAndUriMatch(Method.GET, “/status”, r -> r.respond(b -> b.ok()))[/code]

    says that the server should first check if the request is a HTTP GET request for the URI /status, and if it is, return a HTTP OK response.

  • The next two lines show how you can organize your request handlers while still maintaining an understanding of what happens to the request.For example, the line.elseIfMatchesBeanMethod(outlineBuilder.wrap(new UserController()))

    says that we will see if the request matches a handler method on the UserControllerinstance we passed in. To understand how this matching works, take a look at the UserController class, and one of the request handling methods:

    @Path("/users")
    public class UserController {
     ...
     @GET
     @Path("/{userName}")
     public Observable < Response > getUser(ServerRequest request) {
       return userRepository.getByUserName(request.uriParameter("userName").asString())
        .map(u -> new JsonObject()
         .put(userOutlineCamelCased.member(m -> m.getId()), u.getId())
         .put(userOutlineCamelCased.member(m -> m.getEmail()), u.getEmail())
         .put(userOutlineCamelCased.member(m -> m.getUserName()), u.getUserName()))
        .flatMap(json -> request.respond(b -> b.ok(json.asString())))
        .switchIfEmpty(request.respond(b -> b.notFound()));
      }
      ...
    }

    You can see that we use @Path and @GET annotations to mark request handlers. But the difference is that you can pin-point where the attempt to match the HTTP request to an annotated method was made. It was within your application code – you did not have to go digging through hundreds of lines of framework code to figure out how the framework is routing requests to your code.

  • Finally, in the code from the UserController, notice how the response is created – and how explicit the composition of the JSON is within datamill:
    .map(u -> new JsonObject()
    .put(userOutlineCamelCased.member(m -> m.getId()), u.getId())
    .put(userOutlineCamelCased.member(m -> m.getEmail()), u.getEmail())
    .put(userOutlineCamelCased.member(m -> m.getUserName()), u.getUserName()))
    .flatMap(json -> request.respond(b -> b.ok(json.asString())))

    You have full control of what goes into the JSON. For those who have ever tried to customize the JSON output by Jackson to omit properties, or for the poor souls who have tried to customize responses when using Spring Data REST, you will appreciate the clarity and simplicity.

Just one more example from an application using datamill – consider the way we perform  a basic select query:

public class UserRepository extends Repository < User > {
 ...
 public Observable < User > getByUserName(String userName) {
  return executeQuery(
   (client, outline) ->
   client.selectAllIn(outline)
   .from(outline)
   .where().eq(outline.member(m -> m.getUserName()), userName)
   .execute()
   .map(r -> outline.wrap(new User())
    .set(m -> m.getId(), r.column(outline.member(m -> m.getId())))
    .set(m -> m.getUserName(), r.column(outline.member(m -> m.getUserName())))
    .set(m -> m.getEmail(), r.column(outline.member(m -> m.getEmail())))
    .set(m -> m.getPassword(), r.column(outline.member(m -> m.getPassword())))
    .unwrap()));
 }
 ...
}

A few things to note in this example:

  • Notice the visibility into the exact SQL query that is composed. For those of you who have ever tried to customize the queries generated by annotations, you will again appreciate the clarity. While in any single application, a very small percentage of the queries need to be customized outside of what a JPA implementation allows, almost all applications will have at least one of these queries. And this is usually when you get the sinking feeling before delving into framework code.
  • Take note of the visibility into how data is extracted from the result and placed into entity beans.
  • Finally, take note of how concise the code remains, with the use of lambdas and RxJava Observable operators.

Hopefully that gives you a taste of what datamill offers. What we wanted to highlight was the clarity you get on how requests and data flows through your application, and the clarity into how data is transformed.

datamill is still in an early stage of development but we’ve used it to build several large web applications. We find it a joy to work with.

We hope you’ll give it a try – we are looking for feedback. Go check it out.

Weave social into the web

Disclaimer: This is the second post in a series where we are exploring a decentralized Facebook (here’s the first). It’s written by software engineers, and is mostly about imagining a contrived (for now) technical architecture.

How do you weave elements of Facebook into the web? Start by allowing them to identify themselves and all their content:

  • Establishing a user’s identity can be done rather straightforwardly by creating a unique public-private key pair for a user and allowing them to digitally sign things using their private key
  • Users can then digitally sign content they create anywhere on the internet – they can sign articles they publish, blog posts, comments, photos, likes and +1’s, anything really

Now that they’ve started to identify their content, it’s time to make everyone aware of it:

  • Notifications about content users generate needs to be broadcast in real-time to a stream of events about the user
  • Notifications can be published to the stream by the browser, or a browser plug-in, or by the third-party application on which the content was generated
  • Before being accepted into a user’s stream, notifications neet to be verified as being about the user and their content by the presence of a digital signature
  • Other parties interested in following a user can subscribe to a user’s feed

But that’s all in the public eye. To have a social network, you really need to allow for some privacy:

  • Encrypt data, and allow it to be decrypted selectively – this may include partial content – for example, it’d be useful to have a comment on an otherwise unencrypted site encrypted, only accessible by a select few consumers
  • Allow encrypted content to be sent over plain HTTP over TCP (not TLS) – this way the encrypted payload can be mirrored, and allow consumer privacy (if the consumer can access encrypted data from a mirror, it can do so privately, without the knowledge of the consumer)
  • Encryption is performed with a unique key for every piece of content
  • Decryption is selective in that the decryption key is given out selectively by the publisher (based on authorization checks they perform)