What I've learned this week?

Wednesday, 3 May 2017

This week 8/2017

In this post I will write two small tips.

Slf4J - has one handy feature. One interface solves two problems:
1) firstly it check if message will be used, then transform parameters to String type and build all message,
2) helps to have clean code without concatenation of parameters

LOGGER.debug("Test message {} {} {}{}{}{}{}{}{}", 3, "+", 3, =, null, " 6");

Earlier I use to check logging level manually and use to format messages by String::printf

Lombok - builder pattern is great pattern to create immutable value object with many attributes. Lombok does it for you, you have only to add @Builder annotation. Unfortunately it has other standard of calling setter method (there is no set prefix) and it is useless to create builder for inheritance’s class.

Bellow I compare simple class with lombok annotations and equivalent to it.

import lombok.Builder;
import lombok.Data;
import lombok.NonNull;

@Data
@Builder
public class MyLoombok {

 @NonNull
 private final String attr1;

 @NonNull
 private final int attr2;
}

Generated code of builder.

import lombok.NonNull;

public class MyLoombok {

 @NonNull
 private final String attr1;

 @NonNull
 private final int attr2;

 @java.beans.ConstructorProperties({"attr1", "attr2"})
 MyLoombok(String attr1, int attr2) {
  this.attr1 = attr1;
  this.attr2 = attr2;
 }

 public static MyLoombokBuilder builder() {
  return new MyLoombokBuilder();
 }

 @NonNull
 public String getAttr1() {
  return this.attr1;
 }

 @NonNull
 public int getAttr2() {
  return this.attr2;
 }

 public boolean equals(Object o) {
  if (o == this) return true;
  if (!(o instanceof MyLoombok)) return false;
  final MyLoombok other = (MyLoombok) o;
  if (!other.canEqual((Object) this)) return false;
  final Object this$attr1 = this.getAttr1();
  final Object other$attr1 = other.getAttr1();
  if (this$attr1 == null ? other$attr1 != null : !this$attr1.equals(other$attr1)) return false;
  if (this.getAttr2() != other.getAttr2()) return false;
  return true;
 }

 public int hashCode() {
  final int PRIME = 59;
  int result = 1;
  final Object $attr1 = this.getAttr1();
  result = result * PRIME + ($attr1 == null ? 43 : $attr1.hashCode());
  result = result * PRIME + this.getAttr2();
  return result;
 }

 protected boolean canEqual(Object other) {
  return other instanceof MyLoombok;
 }

 public String toString() {
  return "singleclass.MyLoombok(attr1=" + this.getAttr1() + ", attr2=" + this.getAttr2() + ")";
 }

 public static class MyLoombokBuilder {
  private String attr1;
  private int attr2;

  MyLoombokBuilder() {
  }

  public MyLoombok.MyLoombokBuilder attr1(String attr1) {
   this.attr1 = attr1;
   return this;
  }

  public MyLoombok.MyLoombokBuilder attr2(int attr2) {
   this.attr2 = attr2;
   return this;
  }

  public MyLoombok build() {
   return new MyLoombok(attr1, attr2);
  }

  public String toString() {
   return "singleclass.MyLoombok.MyLoombokBuilder(attr1=" + this.attr1 + ", attr2=" + this.attr2 + ")";
  }
 }
}

Tuesday, 18 April 2017

This week 7/2017

Spring Boot (v. 1.5.x) is a project which supports developer in creating and boot application. Application created by it integrates with long technology stack list mentioned in project reference guide.
Creation of complex application is very easy. Developer add dependences of starter artefact and application should work with default settings. If he need to change defaults, he can add properties file, add custom annotation or set custom settings in code.
Simple application is ready in a few seconds, there is needed only a custom pom or gradle file and writing a few lines of code - simple class with annotation

@EnableAutoConfiguration

and main method in which is execution of run method of the SpringApplication class.
If there is needed a web application, it is not a problem. Spring Boot supports three web containers Tomcat/Jetty/Undertow. If there is needed something else probably it is not a problem as well list is truly long.

Spring Boot contains additional development tools supporting Http caching, automatic restart of application after source update and few less important things for me.

In my case, I was most interested in automatic restart and hot swapping. DevTools don't contain solution as good as JRebel and Spring Loaded because they don't reload byte code during runtime but restart part of application. This solution splits class path on the unchangeable paths and changeable. This first is loaded by standard class loader. The second is loaded by loader which is removed during restart and created new one. It is possible to choose jar files which should be replaced during restart.

Spring Boot support loading configuration parameters (@ConfigurationProperties) and validate them during application start.

In the end I can add that there is a page with an application creator. Generator create pom/gladle file with selected technology stack and sample code.

Sunday, 26 March 2017

This week 6/2017

I did fast research of Vue.js (v. 2.2.2) and I'd like to summary what I got to know about it and how it presents itself in compared to Angular 2 and what I think about it.

1. Performance

Project krausest/js-framework-benchmark tested over 20 JavaScript frameworks, below are results.

src: js-frameworks-benchmark4

As you can see in most cases Angular 2 is slower than Vue.js.

2. Size of attached scripts.

In my case small Angular 2 project with 3 additional modules takes 800kb (prod version and after minification). I wonder to know what will be the size with Vue.js. I found comparison of raw frameworks on Vuejs's web side. Vue.js size is about 23kb but Angular 2 about 50kb. It's really interesting....

3. Learning curve

The creators of Vue framework estimate that it is possible to learn their framework in one day or faster if you know AngularJs. I will see ...

In my opinion to learn Angular 2 in one day is impossible. The same is with AngularJs but I think it was easier then with Angular 2.

4. Testing
On project web page unit testing looks similar to Angular Js or Angular 2 unit.

Resources:

This week 5/2017

In this article I'd like to collect some ideas connected with Microservice Architecture.

1. Bounded context and organisational aspect.
In many sources there is recommendation to create small teams dedicated to one or a few microservices. One microservice should resolve a problem of one bounded context. A small team should well understand implemented part of domain (bounded domain) and should gather maximum "two-pizza team".

2. Possibility to scaling every service of application.

Microservice architecture requires usage of many servers, that's why they have to be monitored and managed centrally. On each server works one instance of service but if we need more instances it is very easy to install it on more servers. This way it is possible to start new instance of service and shut down it if it is needless. Another advantage of Microservice is that small application boot faster than big monolith and it is easy to deploy new code without outage.

3. Duplication of code

There is common opinion that much easier is to control monolith than Microservices. Martin Fowler recommends to try resolve business problem in monolith architecture and then eventually move into microservices. However this style have become fashionable and everyone tries his hands at it and it has nothing common to do with business needs... During migration process there is a lot of duplicated code between services. It could be also big problem. However my idea is to move shared parts of code into small utils jars.

4. Latency time & lack of consistent = additional complexity

Common question is how to get consistency and availability concurrently? Answer is sad but it is impossible. Consistency requires to have shared database where every business operation locks updated or removed record.
However, firstly we have to remember that coding, transmitting and decoding message takes time, especially if services are situated in worldwide locations, transmission has significant influence.
Secondly, fault tolerant services move task to available instance of service.
Then, to have hight performance every service has its own database or even every instance of service. That's why in most cases there is used optimistic locking, data partitioning or CQRS and Event Sourcing. Sometimes it is not enough and some business decisions are required to be ready for border conditionals, ex. hotel's room reservation problem.

5. Interface development - contract problem
This subject is good known for SOA and Microservices as well. The common solution for SOA is to use the same wsdl file for producer and consumer but what if we have many different consumers and we need to extend our interface without engaging teams responsible for immutable services?
There is one way to do that - Consumer-Driven Contract. This approach requires to use only needed fields and inform producer about customer schema.

Resources:

Sunday, 5 March 2017

This week 4/2017

This post collects some of my experiences about creating E2E test in protractor using TypeScript for AngularJs page.

At the beginning, it is required to prepare protractor (v4.0.9) configuration. Mine is below.

exports.config = {
  specs: [
    './e2e/angularjs/**/*.e2e-spec.ts'
  ],
  capabilities: {
    'browserName': 'chrome',
  },
  directConnect: true,
  baseUrl: 'http://localhost:8080/test/',

  framework: 'jasmine',
  rootElement: 'html',
  jasmineNodeOpts: {
        // If true, display spec names.
        isVerbose: true,
        // If true, print colors to the terminal.
        showColors: true,
        // If true, include stack traces in failures.
        includeStackTrace: true,
        // Default time to wait in ms before a test fails.
        defaultTimeoutInterval: 120000
  },
  // compile ts files before run test
  beforeLaunch: function() {
    require('ts-node').register({
      project: 'e2e'
    });
  }
};

In my case I have application which isn't single-page application but every functionality requires to load separate page. However I am going to change it if I find free time.

My notices & tips:

It is good to create a utils class with static methods.
Create one shared authorization method to use it in all tests.
For some test it is good to logout and even clean cookies. It is possible by code: browser.driver.manage().deleteAllCookies();
Sometimes it is good to create screen shot. It is easy to do it.

browser.takeScreenshot().then((png) => {
  let date = new Date();
  let path = './target/_test-output'
  if(!existsSync(path)){
    mkdirSync(path);
  }
  let stream = createWriteStream(path + '/test_' + date.getTime() + '.png');
  stream.write(new Buffer(png, 'base64'));
  stream.end();
});

In configuration file it is good to set long enough timeout interval. I assumed 2 minutes. It is possible to set other, then default 11 sec, timeout in synchronisation mode - parameter allScriptsTimeout.
Look out if you use $timeout. Some demon task can lock synchronisation and test will fail on timeout. Better use $interval.
If you don't use synchronisation, warm up your application before test or set sleep time before each action loading data from server. Ex. My application load dictionaries from db but all constant values are archived in cache. Loading from db is of course much longer then from cache. It is important to remember about it.
Some examples of css selectors:

td:nth-child(2) input
td:nth-child(1) img#saveButton
input[title="Is active?"]

Friday, 3 March 2017

This week 3/2017

This week I spent a little time during my sickness for exploration of Apache's Big Data tools, ex.

Apache Zeppelin
Apache Hadoop
Apache Spark.

Apache Zeppelin is some kind of notebook which allows to get data from supported repositories, ex Spark, Cassandra and jdbc or postgres data source, and process them as a task in notes. You have to configure only connection in XML or by GUI. Zeppelin support a few kinds of presenting data, ex. Table, grids, diagrams.

Hadoop and Spark are used to distributed data processing. Spark is the new idea of that processing and use mainly distributed memory instead of distributed storage as Hadoop. This solution allows Spark to be up to 100 times faster than Hadoop, because as noticed 90% of time consume reading from and writing to storage.

How works distributed processing?

At the beginning there is a cluster with hundreds or thousands of servers. We have to count one hundred arithmetic task. To be fast and fault tolerant, master node of cluster split task between ex. 200 servers. The same task is resolved on two separated machines. If on would fail, the second one will return us solution. In the end main node collect solutions and return them to client.

Hadoop and Spark are used to processing big data by programing model MapReduce. MapReduce retrieves useful data from huge volume of data archived in Big Data resources as nosql databases, files and other resources.

In my case I prepared a small input file and implemented:

- data producer - class which retrieves interesting data from each row of my file.

- mapReducer - class which implements algorithm of aggregate for data retrieved by producer.

Then I need only execute my classes by hadoop.

If you impression after my article is that Hadoop and Spark are the same tools with a little differences, you are in the wrong. I am not an expert in that subject and I have just written about main differences. My little experience is that are different tools with different modules. Of course probably some of them are shared.

Resources:

Friday, 3 February 2017

This week 2/2017

One hopeless project in which I have taken a part has been taught me a few tricks in distribution database architecture, where each database has his specific rule in my organisation.

Remote queries
In my solution take a part 3 oracle 11g+ databases:
1. DB A - designed as storage of all changes applied on source system database,
2. DB D - designed as storage of reports - data warehouse
3. DB B - target database where I need to collect specific data from other databases and generate some report.
To get data from A and D I use dblink.

At the begging I created plain pl/SQL queries with business logic. When I executed them first time, I exceeded database temp area (128G). It was impossible to get data without a few tricks.

At beginning I have to describe what happened. So... if I have join like this:

SELECT A.ID,
       A.NAME,
       B.NAMEB
  FROM TAB_A@REMOTE_DB A
  LEFT JOIN TAB_B@REMOTE_DB B
    ON (A.ID =B.ID)
 WHERE B.ID IN (SELECT C.FK_ID
                  FROM TAB_C@REMOTE_DB C
                 WHERE C.NAME ='Tom')

Database B executes 3 queries, transfer data from remote database to local area and locally matches them.
If we have huge tables but result of that select returns much less rows it's mean that this query should be executed remotely and only results should be transferred to client.
To force database to do something like this, there is a hint:

1	DRIVING_SITE(remote-table-alias)

Unfortunately my experiences shows that it doesn't work with materialised views, creating tables from query, etc. Anyway I found workaround for this problem using procedure with declared cursor. This way I could insert data from query executed fully remotely.

In my case this solution was useful only during testing. Finally I have to move logic on server side and create some views on server side.

Parallel execution
Second useful hint is to force database to execute query in parallel mode.

1	PARALLEL(4)

This mode doesn't work throw dblink as most hints but it is useful locally.

Materialized view

I had a chance to test performance of creating and refreshing materialized views for remote queries which I described above. Result of that tests are bellow:

Description of case	Estimated time cost [sek]
create the materialized view	100
refresh the materialized view by execution DBMS_MVIEW.REFRESH(MVIEW_NAME)	3000
truncate container table and refresh the materialized view by execution DBMS_MVIEW.REFRESH(MVIEW_NAME, ATOMIC_REFRESH => FALSE)	104
drop and create a materialized view	110
refresh by execution DBMS_MVIEW.REFRESH(MVIEW_NAME, ATOMIC_REFRESH => FALSE)	98
refresh the materialized view by execution DBMS_MVIEW.REFRESH('MVIEW_NAME', PARALLELISM => 4, ATOMIC_REFRESH => FALSE);	94

By default materialized view has ATOMIC_REFRESH set to true and all operation are made in transaction one by one. In my case it is not required.

When refreshing more than one materialized view it is possible to do that parallel by execution list of materialized views, ex:

DBMS_MVIEW.REFRESH(LIST => 'MV_A,MV_B,MV_C', PARALLELISM => 4, ATOMIC_REFRESH => FALSE);

If there are some dependences between materialized views, it is possible to turn on specialised analyser but I didn't use it.

Grouping in partitions
By the way PL/SQL allows to grouping in some subgroups. Ex.
If there is a table with not unique customer numbers and it is needed to get his last inserted name, it is possible to do this by query:

SELECT GC.* 
FROM (SELECT P.ID,
             P.NAME,
             P.CUSTOMER_NO,
             ROW_NUMBER() OVER(PARTITION BY P.CUSTOMER_NO 
                                   ORDER BY P.ID DESC) AS ROW_NUMBER 
        FROM MY_CUSTOMERS P ) GC 
WHERE GC.CUSTOMER_NO = 12345 
  AND GC.ROW_NUMBER = 1