Peculiar Java JSON serializer problems

Finally, in my Semantic Web developer career, I got to that point when I had to do some work with JSON in Java. And as it goes, typically one needs to read some JSON, from an API, and make it available in a POJO (deserialize) and vice-versa (serialize).
The only requirement I had, was to avoid using annotations of any kind, such as @JsonProperty or @JsonDeserialize. Let’s take as an example a JSON object describing a person:

Please notice the @name property and that the degree property is a nested JSON object which can contain a different number of sub-properties, not known in advance.

First, we deserialize the above JSON into a Person POJO, then some data gets changed, and then it gets serialized back to a JSON object.

The POJO of the above example JSON is called Person.java and looks like this:

The focus is now on the serializing of this POJO.

Since I could not use annotations (@JsonProperty or @JsonDeserialize), I had to write a custom serializer or adapter (depending on which JSON library is used). This is where the peculiarities started.

My solution is written for both libraries: Jackson and JSON-B with Yasson. Next, I will exemplify them.

Let’s start with Jackson.

I pulled in Jackson version 2.13.4 from Maven and started with a Custom Serializer. The custom serializer needs to override the serialize method of the extended StdSerializer<Person> class. In the serialize method is where the customization happens:

Above, I am attempting to write the degree property, which in the POJO is of type Map<String, Object>, out as a string. This ends up being serialized as a JSON as follows:

did solve the @name property but the degree property is quite wrong! It is a String.

So how can this be improved with Jackson?

The solution is to use writePOJOField(). The exact and correct code looks like follows:

Awesome! Solved it for Jackson. Find the full code on GitHub.

Let’s see the solution also using JSON-B.

I have the same requirement: to not use any annotations. For this, I pulled in the JSON-B 1.0.2 version from Maven central and Yasson 1.0.3.

In the case of JSON-B, I needed to write an adapter to deal with the @name property. As goes, the adapter needs to override the adaptToJson (serialize) and adaptFromJson (deserialize) methods from the JsonbAdapter<Person, JsonObject> interface. So, the first go at it looks like the following (and I only focus on the adaptToJson method here):

If we take a look at the serialized JSON, we have again the same problem as with Jackson. The @name property is ok but the degree property is wrong again! The generated JSON looks like follows:

So how can this be done any better?
While in Jackson we had the brilliant writePOJOField method, JSON-B does not have such a method. My idea was for the adaptToJson method to use a helper method called addRightJsonType. So the correct adaptToJson is:

And the helper method addRightJsonType is a recursive method, that tries to catch all possible types of the entry value and deal with it accordingly:

Find the complete code on GitHub, which includes also the code for adaptFromJson method.

Conclusion

When I set out to serialize JSON in Java, little did I know that this task will come with some peculiarities. Writing a Jackson serializer or a JSON-B adapter means one needs to specify in detail how each property is to be handled. There are no shortcuts like using .toString() on a Map<String, Object> degree property. For some reason, I initially thought that by simply omitting it completely, it would still magically know how to deal with it 😅.

I also see how Jackson, by having a dedicated writePOJOField method, can be considered the most mature Java JSON parsing library. I also say this, because while searching online for solutions, other libraries also had the same problems as Yasson: no dedicated method to do this simple task.

Remark: for code simplicity, some if statements were omitted in the above code. See the full code on GitHub.

Lets recall transaction processing (Java with Spring) – part 2

So lets get back to the overview of transactions in Java with Spring Framework.

Bean-managed transactions can be:

  • container-managed transactions or declarative transaction management
  • application-managed transactions or programmatic transaction management

Declarative transaction management can be XML-based or annotation-based. A disadvantage of declarative transactions is: when a method is executing, it can be associated either with a single transaction or no transaction at all.

Programmatic transaction management gives more liberty. Take as example this pseudocode from Java EE 6 Tutorial (quite old but makes the point):

begin transaction
...
    update table-a
...
    if (condition-x)
   commit transaction
    else if (condition-y)
   update table-b
   commit transaction
    else
   rollback transaction
   begin transaction
   update table-c
   commit transaction

This fine-grained programmatic dependency of when to commit or rollback can only be achieved without declarative transactions.

Transaction Propagation

Not mentioned so obvious is the fact of what is the DEFAULT propagation in Spring transactions.

  • The default propagation is REQUIRED
Propagation types and their behaviour
PROPAGATION TYPE no current transaction there’s a current transaction
MANDATORY throw exception use current transaction
NEVER don’t create a transaction, run method outside any transaction throw exception
NOT_SUPPORTED don’t create a transaction, run method outside any transaction suspend current transaction, run method outside any transaction
SUPPORTS don’t create a transaction, run method outside any transaction use current transaction
REQUIRED(default) create a new transaction use current transaction
REQUIRES_NEW create a new transaction suspend current transaction, create a new independent transaction
NESTED create a new transaction create a new nested transaction

Table is from ninjalj’s blog.

So lets see what this means in the context of a database connection and what the other propagation types.

And because I never went in too much detail here, I recommend to read Marco Behler’s blog to get the full picture.

Lets recall transaction processing (Java with Spring)

Generics about transactions

I am writing a piece about transactions because the subject of transactions sounds so heavy standalone and also because I recently had to recall all the theory from university about it. And surprise surprise, real life software behaves/looks different than in the theory. So, if you are looking to refresh your knowledge and also get the bullet points to simply solidify knowledge keep reading… (at least the bullet points).

Transaction processing is information processing in computer science that is divided into individual, indivisible operations called transactions. Each transaction must succeed or fail as a complete unit; it can never be only partially complete.”

“Transaction processing is designed to maintain a system’s Integrity in a known, consistent state, by ensuring that interdependent operations on the system are either all completed successfully or all canceled successfully.”  Wikipedia

Only reading the definition of transactions we are remembered of the correct way of using them and what the goal should be:

  • indivisible operations
  • succeed or fail as a complete unit
  •  maintain a system’s integrity

We got this on our agenda so lets turn to the technicalities of it.

A good example, which occurs most in real life software, is the work with databases. Databases should have a build in mechanism to tap into their transactional usage either automatic – it takes care on its own on these – or more “manual” – it is up to the developer to decide when the transaction starts and ends. Find out in the design and requirements phase, which scenario you need. Set the parameter of the database correct:

  • database auto-commit true or false

I want to focus on the database.autoCommit(false) scenario next because we differentiate between actions. On some actions transactions are not necessary.

Just because I have to mention: a reliable transactional system must comply to the ACID criteria to be good (atomicity, consistency, isolation, and durability). When you use a transactional system you just need to know that you can rely on ADIC criteria but you do not need to be concerned with each one of them in depth. However, the way you use transactions could violate one of the criteria. A misbehavior is not when a database will rollback if you throw an Exception when in fact it should only be a warning. Conclusion is:

  • you can rely on the ACID criteria the transactional system complies to*

Spring Framework transaction abstraction

Lets get specific and take Spring as a framework that can help you take control of the database transactional system. Spring is not the actual transactional system but it exposes an interface, that perfectly integrates with different transactional systems and it is called Spring Framework Transaction abstraction.

To get started you need to:

  • define the correct PlatformTransactionManager implementation (usually through dependency injection)
  • use the TransactionStatus interface to control transaction execution and query transaction status

The Spring transactional abstraction offers a lot of flexibility when it comes to controlling the transactional system. The implementation you choose must be a trade off between how tightly coupled you want/need to be to Spring’s transaction infrastructure and the need to use a non-invasive lightweight container which has less impact on application code. In this regard you need to choose between:

  • programmatic transaction management in Spring
  • declarative transaction management in Spring (XML or annotation based approach)

You can check out the differences in more detail in the Spring documentation. And a few words on how to chose between them is mentioned here.

Some general advice when using Spring transactions:

  • You are strongly encouraged to use the declarative approach to rollback, if at all possible. Spring docu
  • When using proxies, you should apply the @Transactional annotation only to methods with public visibility. If you do so on other access modifiers(protected, private or package-visible) methods, no error is raised but the annotated method does not exhibit the configured transactional settings. Spring docu
  • Spring recommends that you only annotate concrete classes (and methods of concrete classes), as opposed to annotating interfaces. Spring docu
  • @Transactional annotation on the method of the same class takes precedence (are more important) over the transactional settings defined at the class level. Spring docu
* of course, unless you need to debug exactly that and find out an enterprise ready system has bugs – bad luck…

To be continued…

Or just skip to a blog post I found recently written by Marco Behler.

Ready to connect to the Semantic Web – now what?

As an open data fan or as someone who is just looking to learn how to publish data on the Web and distribute it through the Semantic Web you will be facing the question “How to describe the dataset that I want to publish?” The same question is asked also by people who apply for a publicly funded project at the European Commission and want to have a Data Management plan. Next we are going to discuss possibilities which help describe the dataset to be published.

The goal of publishing the data should be to make it available for access or download and to make it interoperable. One of the big benefits is to make the data available for software applications which in turn means the datasets have to be machine-readable. From the perspective of a software developer some additional information than just name, author, owner, date… would be helpful:

  • the condition for re-use (rights, licenses)
  • the specific coverage of the dataset (type of data, thematic coverage, geographic coverage)
  • technical specifications to retrieve and parse an instance (a distribution) of the dataset (format, protocol)
  • the features/dimensions covered by the dataset (temperature, time, salinity, gene, coordinates)
  • the semantics of the features/dimensions (unit of measure, time granularity, syntax, reference taxonomies)

To describe a dataset the best is always to look first at existing standards and existing vocabularies. The answer is not found looking only at one vocabulary but at several.

Data Catalog Vocabulary (DCAT)

DCAT is an RDF Schema vocabulary for representing data catalogs. It is an RDF vocabulary for describing any dataset, which can be standalone or part of a catalog.

Vocabulary of Interlinked Datasets (VoID)

VoID is an RDF vocabulary, and a set of instructions, that enable the discovery and usage of linked data sets. VOID is an RDF vocabulary for expressing metadata about RDF datasets.

Data Cube vocabulary

Data Cube vocabulary is focused purely on the publication of multi-dimensional data on the web. It is an RDF vocabulary for describing statistical datasets.

Asset Description Metadata Schema (ADMS)

ADMS is a W3C standard developed in 2013 and is a profile of DCAT, used to describe semantic assets.

You will find only partial answers of how to describe your dataset in existing vocabularies while some aspects are missing or complicated to express.

  1. Type of data – there is no specific property for the type of data covered in a dataset. This value should be machine readable which means it should be standardized, possibly to an URI which can be de-reference-able to a thing. And this ‘thing’ should be part of an authority list/taxonomy which is not existing yet. However one can use the adms:representationTechnique, which gives more information about the format in which a dataset is released. This points only to dcterms:format and dcat:mediaType.
  2. Technical properties like – format, protocol etc.
    There is no property for protocol and again these values should be machine-readable, standardized possibly to an URI.
    VoID can help with the protocol metadata but only for RDF datasets: dataDump, sparqlEndpoint.
  3. Dimensions of a dataset.
    • SDMX defines a dimension as “A statistical concept used, in combination with other statistical concepts, to identify a statistical series or single observations.” Dimensions in a dataset can therefore be called features, predictors, or variables (depending on the domain). One can use dc:conformsTo and use a dc:Standard if the dataset dimensions can be defined by a formalized standard. Otherwise statistical vocabularies can help with this aspect which can become quite complex. One can use the Data Cube vocabulary specifically qd:DimensionProperty, qd:AttributeProperty, qd:MeasureProperty, qd:CodedProperty in combination with skos:Concept and sdmx:ConceptRole.Data Cube
  4. Data provenance – there is the dc:source that can be used at dataset level but there is no solution if we want to specify the source at data record level.

In the end one needs to combine different vocabularies to best describe a dataset.

Add a dataset

The tools out there used for helping in publishing data seem to be missing one or more of the above mentioned parts.

  • CKAN maintained by the Open Knowledge Foundation uses most of DCAT and doesn’t describe dimensions.
  • Dataverse created by Harvard University uses a custom vocabulary and doesn’t describe dimensions.
  • CIARD RING uses full DCAT AP with some extended properties (protocol, data type) and local taxonomies with URIs mapped when possible to authorities.
  • OpenAIRE, DataCite (using re3data to search repositories) and Dryad use their own vocabularies.

The solution to these existing issues seem to be in general, introducing custom vocabularies.

References:

Developing for the Semantic Web

This year’s DevFest was again a blast!

I had the opportunity to hold a presentation about what I have been doing lately: a Web Application to show off the power of SPARQL. I turned my experience into an introduction of how to “Developing for the Semantic Web”.

Take a look:



My video from DevFest:



DevFest Vienna Website.

Spring Boot and Polymer

Last week was the Google I/O Developer conference and Polymer 1.0 was presented. So finally my curiosity was sparked and I made some time to check it out a little bit.  I was looking for a fast way to create a JAVA Web Application where I can use Polymer so I heard about how easy and fast Spring Boot is.

So voilà, my first JAVA Web App with Spring Boot and Polymer 1.0. You can clone it from Git and use it as a archetype – the Polymer files are included in the project already. (also for learning purposes). I used Maven to build the project, which is also easy. But one can also use Gradle.

https://github.com/theRealImy/SpringBootPolymer

Using Spring Boot was super easy! One can simply follow the Getting started

Polymer is home here

The only issue I encountered was that the index.html was not displayed. After a bit of reading, in the Spring Boot docu you find:

Do not use the src/main/webapp directory if your application will be packaged as a jar…

By default Spring Boot will serve static content from a directory called /static (or /public or /resources or /META-INF/resources) in the classpath or from the root of the ServletContext.

Fast enough, I changed the folder name and it worked.

Viel Spaß!

Data Statistics View Project

Today is my last day at my Project Assistent job at Vienna University of Technology. I did some summing up of my work and polished the TwitterAPI and also the Data Statistics View code. I want to share my implemnetation of the Data Statistics View code. This was done with html, php, javascript and SQL.  The project can be used for any data types stored in a SQL database.

One of my tasks at university was to download data from the Twitter public stream and analyse it. This work was easier with a tool that allows visualizing the number of downloads per hour/day/month.
The API I used to download tweets  is the one based on Adam Green’s implementation called 140dev. He also has a visualizing tool for the downloaded tweets. However this has less to do with numbers rather much more with the tweet texts.

The code for my implementation can be found on my GitHub repository.
It contains simple bar charts of the number of tweets downloaded.

Bar chart example
 

Working with the Twitter public stream I navigated a lot of questions which I found or did not find answeres to:

  • How can one download tweets only for a specific country?
  • When is the rate limit reached?
  • If the rate limit is reached how loang do I have to wait until I can download again?
  • Why do some Twitter user accounts work and some do not?

And so on…

My time at university was only one part about these and the rest I will probably tell in another post.

AngularJS Workshop

During the Code Week Vienna 2014 , Google Developer Group(GDG) Women Vienna organized an AngularJS beginners workshop. I had the honor to be the trainer of the workshop.

The experience of delivering a training is entirely new compared to learning a tutorial on your own. Already in the preparation of the tutorial I learned a lot. I also got some feedback from the participants and mostly I noticed while I was presenting what was missing. I wrote down what went wrong and also what was good. From the feedback and my own observations I came up with a new improved presentation and more organized tutorials.


Skeleton application : Angular seed
Tutorial page: Intro to AngularJS
Tutorial code: https://github.com/theRealImy/testAngular
Project work: https://github.com/edemguru/angularWorkshop