What I've learned this week?: 2020

Sunday, 27 December 2020

This week 7/2020 - Julia

This post describing my first feel when I completed a Julia basic course.

I am experienced Java developer but I have also osculation with C/C++, Python and Octave languages. For me Julia has something from all those languages.

Linear Algebra support:

Octave:

A = eye(1,2)
%Diagonal Matrix
%
%   1   0

B = eye(3,2)
%Diagonal Matrix
%
%   1   0
%   0   1
%   0   0
C = [A
B]
%C =
%
%   1   0
%   1   0
%   0   1
%   0   0

Julia:

using LinearAlgebra

A = 1*  Matrix(I,1,2)
# 1×2 Array{Int64,2}:
#  1  0

B = 1*  Matrix(I,3,2)
# 3×2 Array{Int64,2}:
#  1  0
#  0  1
#  0  0

C = [A
       B]
# 4×2 Array{Int64,2}:
#  1  0
#  1  0
#  0  1
#  0  0

There is also similarity when multiply matrixes. Julia support ex. operations f(A) = A*A and f.(A) whene every A[i,j] * A[i,j].

Syntax similarity to Python:

Julia:

# for loop
for i in 1:10, j in 1:20
    println("Hi $i , $j")
end
for item in items
    println("Hi $item")
end

# function definition
function power(x)
# last element is returnes - the same as in Python
    x^2
end

# other options to define function 
power(x) = x^2

power = x -> x^2

# immutable sorting
sort(x)

#mutable sorting
sort!(x)

Overload operators:

Python:

class String:
    def __init__(self, x=""):
        self.x = x

    def __add__(self, other):
        return self.x == other.x


p1 = String("test1")
p2 = String("test1")

print(p1 + p2)

Julia:

import Base: +

+(x::String, y:: String) = x == y

# returns boolean value
x+y

Performance:

Benchmarks which I saw in course [3] shows that Julia has similar or a little better performance than C code and this about 2 orders of magnitude than Python.

Resources:

[1] https://julialang.org/

[2] Introduction to Julia (for programmers)

[3] Parallel Computing

Saturday, 12 September 2020

This week 6/2020 - Neo4j

In this article in a nutshell I am describing the Neo4j. This article include topics:

a short description of database,
in what cases it is worth to consider use of graph database,
what are advantages comparing to relational database,
a short description of "graph SQL" - cypher,
a few examples of queries in cypher,
a shortcut how to run Java project with Spring Boot and Spring Data dependences.

Neo4j is a graph database. It is transactional and ACID compliant with native graph storage and processing. It use graph SQL language called Cypher dedicated for graph databases.

Graph databases can be used everywhere where there is a need to archive a graph dependency between objects, so I could say in most cases I know.

Comparing to relational databases, native storing and processing have advantage that matching queries are executed faster than relational queries with exponential cost.

Taking a simple case of customers using some services there is a relation many to many.

In relational database it is required to have a matching table where there are ids of services and using them customers. To connect all customers with single service it is required to find service id then in matching table find customer ids and then in third one find customers.

In graph database every service Node (every object is a node - equivalent of table) stores direct Relation to Node customer. This requires using more storage but it is much faster than matching table relations. Other advantages are:

- auto extending schema model as in other NoSQL databases - adding data of node, relation or property in node or relation schema is automagically extended,

- handle "graph SQL" called Cypher.

Cypher is a dedicated language for graph database. Below I have placed a few examples.

-- simple select from table
MATCH  (c:Human) 
WHERE id = 1 
RETURN c;
-- equivalent in SQL
SELECT * 
FROM Human 
WHERE id = 1;

-- simple relation
MATCH (p:Person) - [r:ACTED_IN] -> (m:Movie)
WHERE  p.name = 'Tom' 
RETURN p, r, m;
-- equivalent in SQL
SELECT * 
FROM Person p
JOIN Relation r on p.id = r.person_id 
JOIN Movie m on m.id = r.movie_id
WHERE p.name = 'Tom'
AND r.type = 'ACTED_IN'

In more complicated case when there is a need to create chain of relations, ex. who is above employee. Is it typical graph case? In Oracle PL/SQL there is something called "CONNECT BY" query construction but how is in other databases, truly I don't know. In MySQL I saw a recurrent procedure storing each level in temporary table, so how is in Cypher?

-- this returns supervisors and their supervisor
MATCH 
path = (n:Person)-[r:REPORTS_TO*]->(s) 
WHERE n.name = 'Tom' 
RETURN s 
ORDER BY length(path)

-- case with subordinates by supervisor id
MATCH 
path = (n)-[r:REPORTS_TO*]->(s:Person) 
WHERE id(s) = 12 
RETURN n 
ORDER BY length(path)

and a few other useful queries

-- delete all schema
MATCH (n) DETACH 
DELETE n

-- create modes and data set
CREATE (:Car:Vehicle { type: "Van"}) <-[:DRIVES ] - (:Human {name: "Basia"})


-- update data
MATCH (h:Human) - [d:DRIVES] -> (c:Car:Vehicle)
WHERE
  c.type = "Van"
SET
  h.name = "Ula",
  c.productionYear = "1982"
RETURN h,d,c

-- constraint - unique field
CREATE CONSTRAINT ON (h:Human)
ASSERT h.name IS UNIQUE

-- delete data matching query
MATCH  (c:Car:Vehicle) <-[d:DRIVES ] - (h:Human)
DELETE c,d , h

Spring Boot project.

In Spring Boot with Spring Data it is required only to add spring-boot-starter-data-neo4j artefact, neo4j properties in path spring.data.neo4j.*, @EnableNeo4jRepositories in configuration and it is possible to create node entities.

In background there is added dependency to org.neo4j:neo4j-orgm-* artefacts and spring-data-neo4j and also org.neo4j.driver artefact.

I was working on:
- docker image of neo4j v.: 4.1.1 without auth.
- JDK11
- Spring Boot v.:2.2.4

Added neo4j dependences was:
- org.neo4j.driver v.: 4.0.0
- org.neo4j:neo4j-ogm-* v.:3.2.6
- spring-data-neo4j v.: 5.2.4-RELEASE

Resources:

[1] Neo4j main page

[2] Neo4j page - developer section
[3] Github - Spring Boot and Neo4j example

Saturday, 29 August 2020

This week 5/2020 - Docker tools

This article is a shortcut of docker tools. These tools are commonly used in micro-service architecture:

Docker
Docker Compose

Docker

is a platform to run application using containers. Containers are created on basis of images created incrementally similar to code repositories, layer after layer.

Container is a environment to run isolated application. It doesn't use their own operating system as virtualized machines. Container share it with host that's why container stand up in a seconds and is lighter for physical machine instead of stand up minutes as virtual machine. That's why docker is commonly used to create instances of application.

Docker can be used interactively, from console. Below most useful commands:

docker ps - show all running container
docker images - show images in local repository
docker run -d [image_name] - run image in daemon mode
docker exec -it [container_id] "[command to run in the container, ex /bin/sh]" - plug in and execute command on specific container
docker container logs [container_id] - print logs from container
docker pull [image name] - pull image from external images repository

but the biggest benefit of docker is that can be used by scripts, so all process is repeatable and can be automatized. Default docker file is Dockerfile. Below some example:

# base image to this build
FROM openjdk:8-jdk-alpine

# define what directory should be mounted to host 
# - mounted directories are created in /var/lib/docker/volumes
VOLUME /tmp

# only inform what ports can expose application
EXPOSE 8080

# define variable
ARG JAR_FILE=target/*.jar


# define variable using environment variable, or "v.1.0.0" if not defined. 
#ENV override ARG variable. Example execution with variable: 
# $ docker build --build-arg CONT_IMG_VER=v2.0.1 .
ENV SOME_ENV_VAR ${CONT_IMG_VER:-v1.0.0}


#copy file from host to container storage
COPY ${JAR_FILE} app.jar

#copy file from host to container storage, but comparing to COPY 
# can also get file from url and extract tar file
ADD ${JAR_FILE} app.jar

# run command in container
RUN uname -a

# health check command - docker is checking if application is working properly
HEALTHCHECK --interval=5m --timeout=3s --retries=5 \
  CMD curl -f http://localhost/ || exit 1


# run application as goal of this image
ENTRYPOINT ["java","-Djava.security.egd=file:/dev/./urandom","-jar","/app.jar"]

having Dockerfile it is executed command:

docker build

and then if this image have to be pushed to remote repository

docker image tag [image_tag_name] [docker repository path]
docker push [docker repository path]

There is also possible to start a repository on docker container, executing a command:

docker run -d -p 5000:5000 --name registry registry:2

and to remove container registry container

docker container stop registry && docker container rm -v registry

Docker Compose

it is a tool to stand up a few containers on basic of docker-compose.yml file. Tool manages with dependences between containers, so by one command it is possible to run many services (containers). Below a few most useful commands:

docker-compose build - build images included in file docker-compose.yml
docker-compose up -d - run containers in daemon mode
docker-compose down - stop containers
docker-compose logs - print logs from containers

and docker-compose.yml file example:

# version of file format
version: "3.3"
# definition of services (container templates)
services:
 #name of service 
  mongoDB:
# image name - this image is retrieved from remote repository  
    image: library/mongo:4.4.0
# container name     
    container_name: "mongoDBcontainerName"
# what if application is dead    
    restart: on-failure
# ports which should be exposed to host (host port: container port)    
    ports:
    - 27017:27017
# images have defined variables, this way are defined their values
    environment:
      MONGO_INITDB_ROOT_USERNAME: sboot
      MONGO_INITDB_ROOT_PASSWORD: example
      MONGO_INITDB_DATABASE: test
# storage mapping ( host : containers path : access mode)      
    volumes:
      - ./src/main/sql/mongo-init.js:/docker-entrypoint-initdb.d/mongo-init.js:ro

  app:
# build properties - this service will be built 
    build:
# where is context path on host    
      context: ./
# docker file      
      dockerfile: Dockerfile
    container_name: "myApp"      
# definition of depending on services      
    depends_on:
      - mongoDB
# this defines in container dns names for depending on services
    links:
      - mongoDB

To prepare this article I used:

Docker in version 19.03.6 - provided by system
Docker Compose in version 1.17.1 - provided by system

Resources:

[1] Docker

[2] Docker Compose

Thursday, 27 August 2020

This week 4/2020 - Machine Learning - part II

This article is a continuation of Machine Learning series. I am presenting a few advices presented by Andrew Ng on coursera course. They are useful when building Machine Learning System (MLS). What is about this article:

how prepare data,
how to debug it,
what are skewed classes,
how to carry out ceiling analysis.

Preparing data set:

On small set off data (up to ~ 10-100 000 records) it is recommended to split randomized data set in following proportions:

60% - training records - used to train algorithm to find θ factors giving lowest cost.

20% - cross validation records - to select best configuration of algorithm, ex. for Neutral Network (NN) to check how many layers should have network or to reduce useless features.

20% - test records - to define performance of MLS.

In case of big volume of data set (above 100 000 records) it is recommended to change proportions, to respectively 92% /4%/4%.

Debugging MLS:

To improve MLS it is good to perform error analysis that's why consider:

- usage of more training examples,

- change set of features (less/more/different),

- adding polynomial features,

- change lambda value in regularization factor,

- change number of nodes or layers (refers to NN).

Size of training set - below I added chart showing dependency between cost function and used records in training set (learning curve).

On the left chart it can be noticed that for high bias when added more data not decrease high error. However when function is complicated it can be observed a huge error gap but when it is added more data it slowly decrease for cross-validation data.

This can be manipulated by changing a set of features (less/more). Below I added chart about dependency between cost (error) and complexity of wanted function and examples of function for one set of data.

How exactly this is done? At first function is trained for training data and then

cross-validation data error is calculated for a few configurations of features.

When it is observed high bias it can mean that wanted function is too simple to prepared data set. It can be required to add new features or create polynomial features from existing features.

When it is observed high variance it can mean that wanted function is too complex. It can be required to remove some features.

It is possible to manipulate bias and variance by changing λ of regularization factor. Below I added 3 charts. For very big λ, just right and λ close or equal 0.

It is noticed that too big λ create almost constant function. When λ is close to 0, regularization factor is negligibly small and can be skipped.

Skewed classes

This term refers to situation when set of data of one category is much larger then set of other category, ex. for binary output, if there is 99 % of examples for "true" category and 1% of examples for "false" category. Then creating logistic regression algorithm and other system returning always "true" it is no so big difference between them. At least 1% of difference in effectiveness - not so bad but systems significantly different.

That's why to compare systems like this they are defined terms:

- true positive,

- true negative,

- false negative,

- false positive

described on draft below:

and measures:

- precision - calculate ratio between true positive and false positive

$$ precision = \frac{TP}{TP + FP} $$

- recall - calculate ratio between true positive and false negative

$$recall = \frac{TP}{TP+FN} $$

What gives a measure for factors precision (P) and recall (R)

$$ F_1score = 2* \frac{P*R}{P+R} $$

so bigger score means better system.

The last term in this article is ceiling analysis - this is more economic term, because focuses on whole system as a set of MLS modules working in pipeline.

This analysis answers for question which module should be improved to get higher accuracy of the application.

Sources:

[1] Machine Learning coursera course

[2] Error Metrics for Skewed Classes and Using Large Datasets

Monday, 10 August 2020

This week 3/2020 - Machine Learning

This article is an abstract of topics I met during Machine Learning course on coursera by Andrew Ng.

1.Supervised Learning - type of algorithms which in learning process uses a training set with input and correct output.

Some function (hypothesis):

$$ h_{\theta}(x) = {\theta}_0 + {\theta}_1x $$

Cost function with regularization factor:

$$J({\theta}_0,{\theta}_1) = \frac{1}{2m} * \sum_ {i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})^2 + \lambda\sum_{j=1}^{n}(\theta_j)^2$$

Goal is to minimize cost function.

1.1 Linear regression - the algorithm adopts factors of an equation to approximate training data and get lowest cost.

In course were presented two methods to archive that:
1.1.1 gradient descent - iterative way - in each iteration a cost function should be closer to a local minimum. The main requirements and uses:
- needs to choose alpha - if too big - increases cost, if too low - increases number of steps to get a minimum of cost function,
- needs many iterations,
- recommended for large number of features,

In loop searching new values (with regularization factor): $${\theta} = {\theta} - {\alpha}*\frac{1}{m} * ((X{\theta})-y)^T X)^T + \frac{\lambda}{m}\begin{bmatrix} 0 \\ \theta_1 \\ \theta_2 \\. \\. \\ \theta_n \end{bmatrix} $$

1.1.2 normal equation - not iterative way to find θ. The main features of this algorithm:
- no alpha factor
- don't iterate to find minimum of cost function
- require to calculate (X^T*X)^-1 what gives complexity O(n³), so is slow for large number of features
- could meet problem with matrix inversion (require some additional operations to calculation(remove redundant features or use regularization)

$${\theta} = (X^TX + \lambda\begin{bmatrix} 0 & 0 & ... & 0\\ 0 & 1 & ... & 0 \\ .& .& .& . \\ 0& 0& ...& 1 \end{bmatrix} )^{-1} X^Ty $$

1.2 Logistic regression - it is classification algorithm that gives binary output.

For more than 2 classes (n-classes) there is used n-functions algorithm and then to get most possible class, it is chosen function with highest output probability. Met problems:
- choose correct decision boundary
- additional optimization algorithms (Conjugate gradient, BFGS, L-BFGS) - usually faster but more complex.

Some function (hypothesis):

$$ h_{\theta}(x) = \frac{1}{1+e^{-{\theta}^Tx}}$$

with probability

$$ P(y=1|x;{\theta})$$

a cost function with regularization factor:

$$J({\theta}) = -\frac{1}{m}[y^Tlog(\frac{1}{1+e^{-X{\theta}}})+(1-y)^Tlog(1-\frac{1}{1+e^{-X{\theta}}})] + \frac{\lambda}{2m} \sum_{i=1}^{m}(\theta_i)^2 $$

The goal is to minimize the cost function. For multi-class classification the algorithm looks for function maximizing h function.

1.3 Neutral networks - it is classification algorithm consists of nodes layers reflecting human brain:

- one neutral network layer is exactly logistic regression so neutral network is complex classifier and it can solve more complex problems

- requires initialization of weight by random values to avoid symmetry
- requires calculation of forward and back propagation (this is expensive operation)

There is example of neutral network with 3 layers - 2 input nodes, 3 nodes hidden layer, 2 nodes in output layer and 2 bias nodes.
Function calculating output of node is:

$$ h_{\theta}(x) = \frac{1}{1+e^{-{\theta}^Tx}}$$

Using θ^(j-1) (a matrix of weights controlling function mapping from layer j-1 to layer j) it is calculated an activation function factor of node i in layer j and an output from node m of previous layer:

$$a_i^{(j)} = g(\sum_{k=0}^{m}\theta_{ik}^{(j-1)}*a_{k}^{(j-1)}) $$

in a last layer activation factor is just h_θ(x). Then it is calculated a cost function with a regularization factor as below:

$$J({\theta}) = -\frac{1}{m}[\sum_{i=1}^{m}\sum_{k=1}^Ky_k^{(i)}log(h_{\theta}(x_k^{(i)})+(1-y_k^{(i)})log(1-h_{\theta}(x_k^{(i)}))] + \frac{\lambda}{2m}\sum_{l=1}^{L-1}\sum_{i=1}^{s_l}\sum_{j=1}^{s_{l+1}}(\theta_j^{(l)})^2 $$

The cost function is minimized by iterative improving θ values. For Neutral Networks it is required to calculate error function. There are following equations to calculate it: for last layer:

$$ \delta = h_{\theta}(x)-y $$
for layers 1...L-1 (where L is number of network layers) $$ \delta^{(l)} = ((\theta^{(l)})^T\delta^{(l+1)}.*a^{(l)}.*(1-a^{(l)}) $$
and back propagation delta:

$$ \Delta_n = \sum_{i=1}^m\delta_n^i*a_{n-1}$$

and derivative of cost function (adaptation gradient)

$$ \frac{\partial J(\Theta)}{\partial\Theta_{ij}^{(l)}} = \frac{1}{m}(\Delta_{ij}^{(l)}+\lambda\Theta_{ij}^{(l)}) $$

regularization factor is removed for first layer. Gradient calculation is very expensive and should be used only as confirmation of simplified numerical solution - approximation of derivative:

$$ \frac{\partial J(\Theta)}{\partial\Theta_{i}} \approx \frac{J(\Theta_1,...,\Theta_i+\epsilon,...,\Theta_n)-J(\Theta_1,...,\Theta_i-\epsilon,...,\Theta_n)}{2\epsilon} $$

1.4 SVMs (Support Vector Machines or Large Margin Classifier) - it is algorithm separating two or more classes with defined boundaries.

Classification of each data is based on similarity function:

$$ f_i=similarity(x,l^{(i)})=exp(−\frac{∣∣x−l^{(i)}∣∣^2}{2σ^2}) $$

There are many methods to create SVM, below only more important:

1.4.1 non-kernel ("linear kernel"): used when there is many features but not many training data. This algorithm is similar to logistic regression

1.4.2 Polynomial kernel - used when there is significant count of training data

1.4.2 Gaussian kernel: used when there is not many features but significant count of training data

$$ min_Θ C \sum_{i=1}^{m}y^{(i)}cost_1(Θ^Tf^{(i)})+(1−y^{(i)})cost_0(θ^Tf^{(i)})+\frac{1}{2}\sum_{j=1}^{n}Θ_j^2 $$

2. Unsupervised Learning - group of algorithms looking for data similarities and aggregate them in defined number of classes.
- if number of classes not forced, it should be defined on basis of a cost function for trained algorithm (elbow method).

2.1 K-means - K number of centroids randomly initialized from training set. Then data are assigned to centroid where cost function is lowest. Iteratively mean of each class is moving to get lowest cost in each class.

This kind of algorithm is used to partitioning data or assign to groups dimensions of products ex. sizes of dresses (S, M, L)

$$J(c^{(1)},...,c^{(m)}, \mu_1,...,\mu_K)= \frac{1}{m}\sum_{i=1}^{m}\lVert x^{(i)} - \mu_{c^{(i)}} \rVert ^2$$
where m - number of training data, K number of centroids (number of classes)

2.2 PCA (Principal Component Analysis) - dimension reductions used in data compression or to reduce data for visualization. Algorithm remove one or more dimensions of each parameter.

Covariance matrix is calculated by: $$ \Sigma= \frac{1}{m}\sum_{i=1}^m (x^{(i)})(x^{(i)})^T $$

2.3 Anomaly detection - algorithm used to detection anomalies in data. This algorithm can be replaced with supervised learning algorithms but it is used when there is a huge number of correct data and a few or no case showing anomalies. Algorithm used to detect anomalies of engines, CPU load, etc.

$$ P(x;\mu,\sigma^2)=\frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(x-\mu)^2}{2\sigma^2}) $$

it bases on the Gaussian distribution so anomaly is detected if if P(x) < ε, where ε is defined threshold.

3. Other ideas
3.1 Recommender Systems - algorithms used by video streaming portals, social media and stores portals to suggest other films, friends or products which can be interesting for customer. Problem could be resolved by linear regression but it is subjective ratio how something is deep in some category, how much someone like specific characteristic of product. Usually system has only a few information about customer or it has no his preferences. That's why it is used collaborative filtering algorithm.

The goal is to minimize cost function
$$ J(x, \theta)= \frac{1}{2} \sum_{(i,j):r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+\frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^{n}(x_k^{(i)})^2+\frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^{n}(\theta_k^{(j)})^2$$ where n_u -a number of customer, n_m - a number of products, r(i,j)=1 - flag if customer rated product, y^(i,j) - value of customer rating.

3.2 Online learning - system where there is no limit of input data. Algorithm is constantly learning and improving its predictions. This require to use proper α.

$$ \theta_j=\theta_j - \alpha(h_{\theta}(x)-y)x_j $$

Data can be also processed in parallel in batch. This can be archived MapReduce algorithm.

Batch gradient descent: $$ \theta_j=\theta_j - \alpha\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})x_j^{(i)} $$ where m is number of data in batch. Each sum calculated in parallel and then combined to one equation.

This is a nutshell of presented algorithms in Andrew's course. More tips and ideas I will present in next article.

Sources:

[1] https://www.coursera.org/learn/machine-learning/home/welcome

[2] https://towardsdatascience.com/svm-and-kernel-svm-fed02bef1200

Monday, 18 May 2020

This week 2/2020 - Python

I had always aversion to Python but finally I forced myself to try it. I am developing myself in machine learning area and most examples and algorithm sources are in Python - life forced me.

In this post in a nut shell I describe basic and most important topic of language like:
- why it was created?
- who use it and why?
- a very short characteristic of language and biggest differences which I noticed.

Python was created 1985 by Guido van Rossun as an interpreted, interactive, object-oriented high level language. Name of language is after Monty Python's Flying Circus TV comedy series. It was designed to be readable and easy to run in academic environment. It uses dynamic data typing validated in runtime. It supports functional programming and it is possible to compile Python code into bytecode usually in bigger applications.

It is good to add that till this year (2020) there were two major versions: 2.x and 3.x. Since 2008 when version 3 was introduced, the older version has been still developed. Versions are incompatible to themselves, so when you learn Python focus on which version you use.

When I write this post current version is 3.8.3 but I trained on 3.6.9 and I used only a few features of the language.

Creator cared of interactive console so each command can be added add-hoc and executed. Probably that's why this language won in scientific community, where code doesn't need to be compiled to be executed.
Currently in Python we can find a lot of tools and libraries to load data, process it and present it (plotters, etc.). Dynamic data type definition is useful when we experiment with code but from my point of view can be dangerous during runtime, when we can meet type incompatibility in runtime.

How Python stand out from the rest languages? The first difference is that Python required some code layout. It doesn't use semicolons and braces so it requires lines and indentations. This is what doesn't convince me to Python. Of course I like pretty formatted code but I acclimated to braces and I don't belief that we can live without them.

if filename and os.path.isfile(filename):
    with open(filename) as fobj:
        startup_file = fobj.read()
    exec(startup_file)

Other differences that language uses words "not", "and" and "or" in conditional statements.
Python widely support list. Programmer can add lists, multiply elements, search and simply filter it by two or there additional signs.

print([1, 2, 3] + [4, 5, 6])  # [1, 2, 3, 4, 5, 6]
print(['blog'] * 4)  # ['blog', 'blog', 'blog', 'blog']
print(3 in [1, 2, 3])  # True
for x in [1, 2, 3, 4]:  # 1 2 3 4
    print(x, end=' ')
print("")
L = ['one', 'two', 'there', 'four']
print(L[1])  # there
print(L[-1])  # four
print(L[2:])  # ['there', 'four']
print(L[:2])  # ['one', 'two']

Python includes tuples which support similar operations as for list and dictionary type as well.

# Tuple
tuple1 = ('cat', 'dog', 19, 200)
tuple2 = "a", 4, "c", 2.3

print("tup1[0]: ", tuple1[1])  # tup1[0]:  dog
print("tup2[1:4]: ", tuple2[1:4])  # tup2[1:4]:  (4, 'c', 2.3)

# Dictionary type
dict = {'Name': 'Tom', 'Age': 27, 'eyes': 'blue'}
print("dict['Name']: ", dict['Name'])  # dict['Name']:  Tom
print("dict['Age']: ", dict['Age'])  # dict['Age']:  27

Python allows to inheritance, overriding but I didn't find possibility to overloading methods.

class Parent:  # define parent class
    parentAttr = 100

    def __init__(self):
        print("Calling parent constructor")

    def parentMethod(self):
        print('Calling parent method')

    def setAttr(self, attr):
        print('setAttr(self,attr)')
        Parent.parentAttr = attr

    def getAttr(self):
        print('setAttr(self)')
        print("Parent attribute :", Parent.parentAttr)


class Child(Parent):  # define child class - inheritance
    def __init__(self):
        print("Calling child constructor")

    def parentMethod(self):  # method overriding
        print('Calling parent method in child')

    def childMethod(self):
        print('Calling child method')

Python contains annotations mechanism:

def decorator(annotatedText):  # definition of annotation
    def text_generator(old_function):
        def new_function(*args, **kwds):
            return annotatedText + ' ' + old_function(*args, **kwds)
        return new_function
    return text_generator # it returns the new generator

# Usage
@decorator('prefix') # text attached before function resul
def return_text(text):
    return text

# Now return_text is decorated and reassigned into itself
print(return_text('myText')) # 'prefix myText'

and finally lambda expressions:

1 2	d = lambda d : d + 20 print(d(10)) # 30

My feeling a bit changed after I went through a tutorial and wrote some code but it is too little to feel free in it. Maybe after more machine learning exercises I will love it.

Resources:
1. Python 3 Tutorial (tutorialspoint)
2. Python documentation
3. How to Use Python Lambda Functions

Thursday, 12 March 2020

This week 1/2020 - Elm lang

In this article I will describe what is Elm, how to start adventure with it and some details about this language. I am still exploring this language so please forgive some mistakes.

1. What is Elm lang?

Elm is statically typed strongly functional language compiled to JavaScript. Structure of code is similar to Python - Elm doesn't use braces but requires indentations. Strong typing protects developer from most of technical errors and unknown state of application. All technical errors are caught in compile time and developer is informed about them by detailed messages which usually contains suggestion how to fix it.

2. What tools includes Elm?

Elm command support development and module upgrading. The most useful command are:

elm init - initialize project structure. creates src directory and elm.json file.
elm repl - starts interactive programming session,
elm reactor - runs local web server to see project
elm make - compile code to JavaScript
elm install - fetches packages

and less popular:

elm-test init - creates tests dictionary with example sources and updates test dependence in elm.json
elm bump - updates version of packages depending on this changing package
elm diff - detects changes between versions of packages
elm publish - publish your code in elm lang repository

For more detail please check documentation.

3. How to start?

Using npm Elm tools can be installed by few commands:

npm install elm
npm install elm-format
npm install elm-test

and then to initialize first project

elm init

and initialize test to this project

elm-test init

In project dictionary there are created directories and files:

src - directory where should be stored production sources
tests - directory with test sources
elm.json - file with project description and dependence, ex. below

{
    "type": "application",
    "source-directories": [
        "src",
        "tests"
    ],
    "elm-version": "0.19.1",
    "dependencies": {
        "direct": {
            "elm/browser": "1.0.2",
            "elm/core": "1.0.5",
            "elm/html": "1.0.0",
            "elm/http": "2.0.0",
            "elm/json": "1.1.3",
            "elm/random": "1.0.0"
        },
        "indirect": {
            "elm/bytes": "1.0.8",
            "elm/file": "1.0.5",
            "elm/time": "1.0.0",
            "elm/url": "1.0.0",
            "elm/virtual-dom": "1.0.2"
        }
    },
    "test-dependencies": {
        "direct": {
            "elm-explorations/test": "1.2.2"
        },
        "indirect": {}
    }
}

When project is initialized, we can create first elm application.
I use IntelliJ with Elm plug-in, however it is possible to create elm source file in notepad and save file with elm suffix.

When elm file is created, it should be compiled to JavaScript. It is done by command

elm make src/Main.elm

By default is created a file "index.html" with JavaScript included.

4. Architecture

In a null shell about architecture of Elm. Elm uses pattern Model View Update. To update view is used virtual DOM tree, where each update operation creates new copy of virtual DOM tree, then each new copy is compared with previous one and then all changes are applied finally on real DOM tree in one big batch. This solution much improves changes on real DOM tree.

source: https://elmprogramming.com/virtual-dom.html

5. Basic of language?

Types

All variables in Elm are immutable by design. It offers:

- simple types:

Bool
Int
Float
Char
String

- complex types:

typed List is a linked list what simplify operations on it. List can be created by collecting elements one type in square brackets
[elem, elem, elem]
or add new elements by calling

elem :: [elem]

array is also typed as List and can be created from List. Array allows for direct access to each field
tuple is a set of different type elements and is typed as well. tuple is created by collecting elements in round brackets
( elemA, elemB, elemC )
record is a structure of data. Record is created by collecting name of data and values in braces
var1 = { field1 = elemA, field2 = elemB }
or

var1 = RecordType elemA elemB
where record's variables must be in the same order as in definition

type alias RecordType = { field1 : String, field2 : Int}

Maybe is wrapper on object to avoid null pointers. It contains values: Just with value or Nothing.

- custom type - created by developer

type UserStatus = Regular | Visitor

- special types

"_" has special meaning. It represents any type. It can be used as default value in case construction or as unused input of function
unit type "()" - represent empty value
inline function requires "\" before declaration

\elem -> elem + 1

redirecting function result to the funtion on the left "<|" or right "|>" function

Let / if / case constructions

let
    definition
in
    function body


case variable of
    case_element -> body handling case
    _ -> body of default handling

if condition then
    body
else
    body

Modules:

When application is bigger and bigger it is required to split code into separated files. Elm defines each separate file as module. Each module can contain private or public elements, what is defined in header of elm file.

module ModuleName exposing (list_of_elements_to_be_public)

for all elements instead of elements list is used two dot, ex.

module ModuleName exposing (..)

Importing module can import all elements but there is required source module prefix,ex.

import Module

or it is possible to make alias to prefix name, ex.

import Module as M

or in some cases it is better to create direct connection to element in depended module - like a static import in Java

import Module exposing (exposing_function1,exposing_function2)

there is also possible to make mix of those solutions, ex.

import Module as M exposing (exposing_fun1,exposing_fun2)

if there is need to move modules to subfolders, module name is preceded with folder path separated with dot (similar to Java packages), ex

module folder1.folder2.ModuleName exposing (..)

To compile application it is needed only to indicate main module of application.

source: https://elmprogramming.com/

Ports

Elm can be run as separated from surrounding world or can communicate with it. When Elm need to communicate with JavaScript it is required to add port specifier.

1	module MainTable exposing (..)

When communication is from Elm to JavaScrit it is required only defining port function in Elm and callback function in JavaScript.

--- ELM ---
port sendData : String -> Cmd msg

-- JavaScript  ----
app.ports.sendData.subscribe(function(data) {
    alert("Data from Elm: " + data);
});

In other way in Elm it is required to define subscriptions parameter in Browser.element, handling port function and in JavaScript code call function.

--- ELM ---
main =
    Browser.element
        { init = init
        , view = view
        , update = update
        , subscriptions = subscriptions
        }

subscriptions e =
    receiveData changeRowValue

port receiveData : (Int -> msg) -> Sub msg


-- JavaScript  ----
function jsEvent(idx){
  app.ports.receiveData.send(idx);
}

Application Entry Point

Similar to other languages, "main" function is defined as entry point to application. If compilation is run with other output than /dev/null, error is thrown.

6. Test

At the beginning it is required to install elm-test

npm install elm-test

what modifies the file "elm.json".
Test module shares developers a few tools, what are:

Test - test definition
Expect - set of assertions
Fuzz - tool to generate random data and run test for each generated value

7. Sources

[1] - Elm Guide
[2] - Beginning Elm

Sunday, 27 December 2020

Saturday, 12 September 2020

Spring Boot project.

Resources:

Saturday, 29 August 2020

Docker

Docker Compose

Thursday, 27 August 2020

Monday, 10 August 2020

Monday, 18 May 2020

Thursday, 12 March 2020

1. What is Elm lang?

2. What tools includes Elm?

3. How to start?

4. Architecture

5. Basic of language?

Types

- simple types:

- complex types:

- custom type - created by developer

- special types

Let / if / case constructions

Modules:

Ports

Application Entry Point

6. Test

7. Sources

Total Pageviews