10 tips for migrating from Maven to Gradle

Here’s a quick list of 10 lessons we learned when making the switch to Gradle from Maven for StackFoundation. Coming from a deeply Maven place, these are the things that gave us an “Aha! That’s how you do it in Gradle!” moment. As with any other tool, these are not the only way (nor the best) to do some of these things with Gradle – this is not meant to be a prescriptive list of best practices. Rather, it’s just a few things to help those Mavenistas out there who are thinking of switching to Gradle, or actively switching to Gradle, and figuring out how to get their mind to think Gradle.

1) Forget the GAV, use the directory layout!

Maven folks are used to thinking about a module’s GAV – it’s group, artifact, and version. When you switch to Gradle, you don’t have to think about this so much. Gradle will name projects based on the names of the directories by default. So if you have the following multi-project directory structure:

  • server
    • core
      • src/main/java
    • logging
      • src/main/java

These projects are named server, core, and logging. In Gradle, projects are identified with a fully-qualified path – in this case the paths are going to be  :server,  :server:core and  :server:logging.

Note: You can give projects a group, and version, if you want.

2) Build everything from the root!

More than a GAV, when you start using Gradle regularly, you’ll start thinking of projects by their path.

In Maven, you’re probably used to switching to a particular sub-module directory and then invoking mvn clean install, etc from there. In Gradle, you’re going to kick off all builds from the root of your multi-project setup. And you can simply use a sub-project’s path to kick off a task for that project. For example, you can invoke gradlew :server:logging:buildto build the logging sub-project, within the top-level server project.

3) Use custom tasks!

In Maven, if you need to perform some custom logic for a build or deploy, you go hunting for a particular plug-in, and then see if invoking one of it’s goals at some spot within the fixed Maven build lifecycle accomplishes what you want. If not, you try and look for another plug-in. And another, and then you might try writing one yourself.

Gradle is fundamentally built around tasks. You’re going to end up writing a custom task for a lot of what you want to do. Build a package by combining things in a specific way? Write a task. Deploy a service? Write a task. Setup infrastructure? Write a task. And remember all Gradle scripts are Groovy scripts so you are writing Groovy code when writing your tasks. Most of the time, you won’t write a task from scratch – you’ll start with a plug-in (yes, like in Maven, you’ll start by searching for plug-ins), and one of the tasks it defines, and then customize it!

4) Name your tasks, give them a group and description!

If you have a complex Maven project, you are very likely using a number of profiles, and you will probably have a specific order to build things, and maybe even a specific order to run things with different profiles activated. You’ll end up documenting this on a Wiki or a README file in your Git repo. And then you’ll forget to update that document so that eventually, how exactly something is built is only tribal knowledge.

In Gradle, you create custom tasks. This was already point 3 – but once you create them, you can give these tasks a group and a description. We give our most important custom tasks the group name ‘StackFoundation’. That way when we run a gradlew tasks, we see a list of tasks specific to our project in the list of available tasks to run. A great way to document our tasks.

5) Alias tasks, name them something you will remember!

Picking up from 3 and 4: You can create a task just to alias another task defined by another plugin. For example – the Shadow plugin is the Gradle version of the Maven Shade plugin. You might be happy with the default shadowJar task it provides but if in your project, a more meaningful name for creating that shadow JAR package is createServicePackage, you can create an alias:

task createServicePackage(dependsOn: shadowJar)

Note: It’s not exactly an alias, but close enough.

6) The Shadow plugin is the Gradle version of the Maven Shade plugin

This one is used by enough Maven folks that it’s worth repeating.

7) Use the Gradle wrapper

With Maven, you have to get everyone to setup Maven or use an IDE which comes with a Maven built-in in order to run builds for your project. With Gradle, there’s the Gradle wrapper – and you’re meant to check it in to your team’s repo. Setup your project to use the wrapper, and put it in your source control repo! Your team won’t have to think about getting Gradle.

8) Forget the inheritance parent, use external build scripts to define common tasks

In Maven, you use an inheritance parent to manage dependencies, and plugins.

With Gradle, you can reference other Gradle files from a build.gradle file – you do that using something that looks like: apply from: '../../gradle/services.gradle'. These are called external build scripts and there’s some caveats to using them but they’re a great way to define common tasks. For example, you can create some common tasks for deploying any of the services you use in your projects inside gradle/services.gradle and reference them from your other Gradle files.

Note: You can also put common task code inside buildSrc.

9) Forget the inheritance parent, create custom libraries

In Maven, you use a parent POM to define common dependencies. With Gradle, you can define common dependencies by putting them in an external build script (described in point 6). Here’s an example of a file in gradle/dependencies.gradle which defines some common libraries we use in all of our projects:

repositories {
    mavenLocal()
    mavenCentral()
}

ext {
    libraries = [
            aws            : {
                it.compile('com.amazonaws:aws-java-sdk-s3:1.11.28') {
                    exclude group: 'org.apache.httpcomponents', module: 'httpclient'
                    exclude group: 'com.fasterxml.jackson.core', module: 'jackson-annotations'
                    exclude group: 'com.fasterxml.jackson.core', module: 'jackson-core'
                    exclude group: 'com.fasterxml.jackson.core', module: 'jackson-databind'
                    exclude group: 'com.fasterxml.jackson.dataformat', module: 'jackson-dataformat-cbor'
                }
            },
            awsEcr         : 'com.amazonaws:aws-java-sdk-ecr:1.11.28',
            datamill       : {
                it.compile('foundation.stack.datamill:core:0.1.1-SNAPSHOT') {
                    exclude group: 'org.apache.httpcomponents', module: 'httpclient'
                }
            },
            datamillLambda : 'foundation.stack.datamill:lambda-api:0.1.1-SNAPSHOT',
            junit5 : [
                'org.junit.jupiter:junit-jupiter-api:5.0.0-M4',
                'org.junit.jupiter:junit-jupiter-migration-support:5.0.0-M4'
            ],
    ]
}

Note the use of GAVs to refer to Maven dependencies, and how you can setup exclusions using this approach. With this approach, we get to give our own names to these libraries instead of referring to everything with GAVs. This is especially great for us because colloquially, we refer to our dependencies using these names and this makes looking at project dependency information clear and concise. In addition, we can group multiple Maven dependencies into one custom user library, as with the junit5 example.

Here’s how a particular project defines the libraries as dependencies:

dependencies {
    compile libraries.datamill(it)
    testCompile libraries.junit5
}

10) Doing resource filtering

In Maven, you probably use resource filtering to replace property placeholders in resource files. There’s two equivalents in Gradle – the first is to use ReplaceTokens:

processResources {
    def props = [imageVersion: 'unspecified']
    filesMatching('*.properties') {
        filter(org.apache.tools.ant.filters.ReplaceTokens, tokens: props)
    }
}

This looks for placeholders of the form @imageVersion@, i.e., they’re delimited by @’s. It tolerates missing property names. A second form looks like this:

processResources {
    props = [imageVersion: 'unspecified']
    filesMatching("**/*.yaml") {
        expand props
    }
}

This looks for property placeholders of the form $imageVersion – well, sort of. It’s actually using a template mechanism in Groovy which makes it very powerful but if you use it for simple cases, you’ll probably encounter the following: if a simple placeholder references a missing property, your build will fail in error!

That’s all for now! More lessons from our experience migrating to Gradle at StackFoundation will be for a future post! Hope that helps those of you making the switch from Maven!

 

Going serverless with Lambdas

This is the first post of a series where we’ll be writing about the motivation and decision making process behind our cloud strategy for Sandbox.

Setting the stage

Our first assumption was that, given that Sandbox follows a client-server architecture and that most of its core functionality is on the client side, our cloud requirements wouldn’t be as high as if we were a pure SAAS product.

When we started architecting our cloud components for Sandbox, we came up with the following list of requirements:

  1. It had to allow us to have a low time-to-market. Rewriting certain parts later would be something we would accept as a side effect of this.
  2. It had to be cost efficient. We didn’t anticipate much traffic during our early days after our initial go live, so it felt natural to look for a ‘Pay as you go’ model, rather than having resources sitting idle until we ramp up our user base.
  3. It needed to allow us to have a simple go live strategy. This needed to apply both in terms of process and tooling used. It obviously also needed to leverage our previous knowledge of cloud providers and products/services.

So, what did we decide to go with, and most importantly, why?

First, choosing the cloud provider: we had a certain degree of experience using AWS, from our professional life as consultants, but especially from our days working at StackFoundation, so this was the easy choice!

The AWS offering is huge, and constantly evolving, so deciding which services to use wasn’t that straight-forward. From a computing perspective, there seemed to be 2 clear choices, using ECS to handle our EC2 instances, or going serverless with Lambdas. Our experience leaned more towards EC2, but when we revisited our cloud architecture requirements, we concluded Lambdas were a better fit, as they ticked all the boxes:

  1. In terms of the actual application development, choosing EC2 or Lambdas didn’t seem to make much of a difference. Once you decide on whether to go with plain Lambdas or use a framework to make Lambda development closer to standard API development, feature development was pretty straightforward, with a low learning curve. We initially went with Jrestless although we crafted our own solution lambda-api module of the Datamill project, more on this on future posts. On the EC2 world, this would have meant us using micro-services running on containers, so pretty much the same picture.
  2. The term ‘Pay as you go’ feels like it was coined for Lambdas, at least in terms of compute resources. You get billed based on your memory requirements and actual usage period (measured to the nearest 100ms). On the other hand, with EC2 instances, you have to keep your instances live 24×7. So we either we had to go with low-powered T2 family instances to keep the cost low and struggle if we were to get unpredicted usage spikes, or go with M3 family instances, and potentially waste money on them during periods where they were under-used. This would  probably be the case during our early preview period.
  3. In terms of overall architecture, it is more complicated to configure ECS clusters and all the different components around them than to simply worry about actual application code development, which is one of the selling points of Lambdas (and the serverless movement in general). This may be an overstatement, as even with Lambdas there is additional configuration. For example, you also need to take care of permissions (namely Lambda access to other AWS resources like SES, S3, DDB, etc) but definitely at least an order of magnitude simpler than doing this on the ECS front. In addition, even though the AWS console is the natural path to start with until you get a grasp of how the service works and the different configuration options you have, you will eventually want to migrate to using some kind of tooling to take care of deployment, both for your lower environments as well as for production. We came across Serverless, a NodeJS based framework that aids you in your Lambda infrastructure management (or other function based solutions by other providers). The alternative would have probably been to go with Terraform, which is an infrastructure as code framework. Our call here was motivated both in terms of project needs and the overhead of the learning curve of the tooling to use. If we went with Lambdas, we would mainly need to worry only about them in term of AWS resources, so Serverless offered all we needed in this sense. In terms of the learning curve, using a general purpose framework like Terraform would imply us having to deal with concepts and features we wouldn’t be needing for our use case.

So, now that we’re live, how do we feel about our assumptions/decisions after the fact?

In terms of application development, the only caveat we found was in terms of E2E testing. We obviously have a suite of unit and integration tests on our cloud components, but during development, we realised some bugs arose from the integration with our cloud frontend that couldn’t be covered by our integration tests. This was true specially when services like API Gateway were involved in the problem, as we haven’t found a way of simulating this on our local environment. We came across localstack late in the day, and gave it a quick chance, but didn’t seem to be stable enough. We couldn’t spend longer on it at the time, so decided to cover these corner scenarios with manual testing. We might decide to revisit this decision going forward, though.

In terms of our experience in working with Lambdas, the only downside we came across was the problem of cold Lambdas. AWS will keep your Lambdas running for an undocumented period, after which, any usage will imply that the Lambda infrastructure needs to be recreated in order for it to serve requests. We knew about this before going down the Lambda path, so we can’t say it came as a surprise. We had agreed that in the worst case scenario, we would go with a solution we found proposed on several sites: having a scheduled Lambda that would keep our API Lambdas “awake” permanently. To this end, we added a health check endpoint to every function, which would be called by our scheduled Lambda. In future releases we will also use the health check response to take further actions, for example notify us of downtime.

Overall, we’re quite happy with the results. We may decide to change certain aspects of our general architecture, or even move to an ECS based solution if we go SAAS, but for the time being this one seems like a perfect fit for our requirements. Further down the line, we’re planning to write a follow up post regarding the concepts discussed on this post, with some stats and further insights of how this works both in production and as part of our SDLC.

Of course, there is no rule of the thumb here, and what might work for us may not work for other teams/projects, even if they share our list of requirements. In any case, if you’re going down the serverless journey, we’d love to hear about your experiences!