Updated on 29 May 2019

UBER: open vs close
Figure 1: UBER open vs close. Raw stock price data was pre-processed, or transformed, with a Kotlin program and exported to a CSV format. The results were then plotted. Data source: Alpha Vantage.

Introduction

Data is a fundamental part of the modern world. The following stock data is for UBER as traded on the NYSE.

{
  "Meta Data": {
    "1. Information": "Daily Time Series with Splits and Dividend Events",
      "2. Symbol": "UBER",
      "3. Last Refreshed": "2019-05-28",
      "4. Output Size": "Compact",
      "5. Time Zone": "US/Eastern"
  },
  "Time Series (Daily)": {
    "2019-05-28": {
      "1. open": "41.7000",
        "2. high": "41.8000",
        "3. low": "40.6000",
        "4. close": "40.9500",
        "5. adjusted close": "40.9500",
        "6. volume": "12509106",
        "7. dividend amount": "0.0000",
        "8. split coefficient": "1.0000"
    },
    "2019-05-24": {
      "1. open": "41.2800",
        "2. high": "41.5100",
        "3. low": "40.5000",
        "4. close": "41.5100",
        "5. adjusted close": "41.5100",
        "6. volume": "8786751",
        "7. dividend amount": "0.0000",
        "8. split coefficient": "1.0000"
    },

It consists of a series of nested JSON objects (or dictionaries, or maps, depending on your point of reference). How would you go about parsing this data to extract a series of prices where there may be hundreds of days?

Normally, I would choose something like Python or JavaScript to handle this kind of pre-processing. However, for the purpose of evaluation and learning, I chose to use Kotlin. I will show what I had to go through not being familiar with this trending language but having experience with other languages. I also decided to use the Klaxon JSON parser so I would I have a relatively nontrivial experience as compared to a Hello World that doesn’t require dependencies.

Getting started

My goal was to do a kind of test driven development by running my experiment through a test runner.

Since Kotlin is made by JetBrains, IntelliJ IDEA makes for a natural IDE choice to start with. Choosing File > New > Project > Kotlin didn’t give me a clue about how to setup testing in Kotlin. Neither did File > New > Project > Gradle > Kotlin/JVM. Both approaches didn’t include a template for running tests.

After fumbling a bit, I stumbled onto gradle init and found it produced a testing target. The following is the result of my init.

$ gradle init

Starting a Gradle Daemon (subsequent builds will be faster)

Select type of project to generate:
  1: basic
  2: cpp-application
  3: cpp-library
  4: groovy-application
  5: groovy-library
  6: java-application
  7: java-library
  8: kotlin-application
  9: kotlin-library
  10: scala-library
Enter selection (default: basic) [1..10] 8

Select build script DSL:
  1: groovy
  2: kotlin
Enter selection (default: kotlin) [1..2] 2

Project name (default: kotlin-json-processor):
Source package (default: kotlin.json.processor): com.ikiapps.kotlinJSONProcessor

BUILD SUCCESSFUL in 44s
2 actionable tasks: 2 executed

Thereafter, I was able to import my manually created project into IntelliJ and see the main and test directories within it. I was then able to run the bundled test in the test runner! I felt like I made great progress at this point but ran into other problems.

Dependency management

The installation documentation for Klaxon consists of the following text.

repositories {
    jcenter()
}

dependencies {
    implementation 'com.beust:klaxon:5.0.1'
}

It doesn’t specify where to put that. After some poking around, the Gradle Kotlin DSL file build.gradle.kts seemed like the right place.

Did I mention that Gradle can be daunting to the uninitiated? Considering it’s a cross-platform build system and dependency manager, it’s covering a whole lot of bases. It seems quite powerful in its range of capabilities.

Programming

Getting down to coding after solving the build and dependency management overhead was exciting. The next big challenge was understanding Kotlin types. Not having a significant Java background probably slowed me down. However, I’ve been doing so much Swift that static types are a perfectly comfortable concept.

Some of my first attempts to parse the JSON are listed below along with my comments.

 1 val data1 = Klaxon().parse<String>(File(pathname).toString())
 2 // This just tried to parse the filename.
 3 
 4 val data2 = Klaxon().parse<File>(StringReader(File(pathname).toString())
 5 // What's a Reader? I think that's what I need.
 6 
 7 val data3 = Klaxon().parse<String>(File(pathname).readText())
 8 // Oh, there's a readText(). Why didn't this work? It seems it's the wrong parse call even though the types match.
 9 
10 val data4 = Klaxon().parseJsonObject(StringReader(File(pathname).readText()))
11 // There we go. I get two keys as a result.

Having obtained a syntax allowing me to extract the top-level keys in the JSON data, Meta Data and Time Series (Daily), I was hopeful that I’d be able to get out more. I thought that it would be nice if the days were arranged in an array than I could do a parseArray and iterate over the members.

With further experimenting, it turned out the nested dictionaries weren’t as terrible as they first seemed. I found I could get the time series data by filtering on my parse results to isolate the time series and then map it to get a result. My code is below and note how a return statement is not used.

1 val parsed = klx.parseJsonObject(StringReader(f))
2 val timeSeries = parsed.filter {
3     it.key == dataKey
4 }.map {
5     it.value as JsonObject
6 }

My result in timeSeries is another JsonObject, as defined by Klaxon. I tried many of the many variations of the parse function variations, found in Klaxon, on this object until I came onto parseFromJsonObject(JsonObject). That function allowed parsing the already parsed JSON text now represented as a JsonObject.

Since everything I was interested in was in the first (read that as “only”) filtered member, I could pull it and “forEach” the days. I read about data classes in the docs and it took me some time to figure out how to put them into play. They are essentially a model of the data you are working with and can be used during extraction.

The code form eventually looked like the following snippet:

1 timeSeries.first().values.forEach {
2     val day = klx.parseFromJsonObject<Daily>(it as JsonObject)
3 }

where a Daily is defined as

 1 data class Daily
 2 (
 3     @Json(name = "1. open")
 4     val open: String,
 5     @Json(name = "2. high")
 6     val high: String,
 7     @Json(name = "3. low")
 8     val low: String,
 9     @Json(name = "4. close")
10     val close: String,
11     @Json(name = "5. adjusted close")
12     val adjustedClose: String,
13     @Json(name = "6. volume")
14     val volume: String,
15     @Json(name = "7. dividend amount")
16     val dividendAmount: String,
17     @Json(name = "8. split coefficient")
18     val splitCoefficient: String
19 )

Since all the key-value pairs were defined as String: Any, I wasn’t going to be able to use types to separate the individual elements. My goal was to recursively descend each day member in the JSON tree while being able to access individual days.

I liked being able to define a data schema for only the part I was interested in. It would have been tedious to model the entire data structure just to get the dailies. The @Json annotations, provided by Klaxon, were elegant at handling key name mappings.

Interlude

I wrote tests but had yet to assert anything. I chose to add a check on the count of days with:

 1 @Test fun testParseDaily1()
 2 {
 3     val cnt = 0
 4     val dataKey = "Time Series (Daily)"
 5     val klx = Klaxon()
 6     val f = File(pathname).readText()
 7     val parsed = klx.parseJsonObject(StringReader(f))
 8     val timeSeries = parsed.filter {
 9         it.key == dataKey
10     }.map {
11         it.value as JsonObject
12     }
13     timeSeries.first().values.forEach {
14         cnt.inc()
15         val day = klx.parseFromJsonObject<Daily>(it as JsonObject)
16     }
17     assertEquals(cnt, 0, "Expected fall-through.")
18 }

What happened, at first, is that my test failed because it fell through due to async processing. Therefore, I setup a little blocking or an async-await pattern to handle this via Kotlin coroutines.

 1 fun waitForCount() = runBlocking {
 2     val count = async {
 3         parseDaily()
 4     }
 5     count.await()
 6 }
 7 
 8 suspend fun parseDaily(): Int
 9 {
10     val dataKey = "Time Series (Daily)"
11     val klx = Klaxon()
12     val f = File(pathname).readText()
13     val parsed = klx.parseJsonObject(StringReader(f))
14     val timeSeries = parsed.filter {
15         it.key == dataKey
16     }.map {
17         it.value as JsonObject
18     }
19     var days = ArrayList<Daily?>()
20     timeSeries.first().values.forEach {
21         val rslt = klx.parseFromJsonObject<Daily>(it as JsonObject)
22         days.add(rslt)
23     }
24     return days.size
25 }
26 
27 @Test fun testAsyncDailyCount()
28 {
29     val result = waitForCount()
30 }

More functional

That’s not bad. I was able to make an ArrayList of the daily data I was interested in. However, this still felt a bit too imperative for my tastes. If I could somehow return the count of days without having to loop over an intermediate data structure, I might be able to factor out the async-await by having my assertion be performed on the result of a single operation chain. Here’s what I came up with:

 1 @Test fun testParseDaily2()
 2 {
 3     val expected = 13
 4     val klx = Klaxon()
 5     val dataKey = "Time Series (Daily)"
 6     val days = klx.parseJsonObject(StringReader(File(pathname).readText()))
 7         .filter { it.key == dataKey }
 8         .map { it.value as JsonObject }.first().values
 9         .map { klx.parseFromJsonObject<Daily>(it as JsonObject)?.close }
10     assertEquals(days.size, expected, "Bad count of " + days.size + ", expected " + expected + ".")
11 }

If you had answered the question of how you would do this, would your method involve more or less code? I’d be challenged to come up with a more concise result as the number of calls in this parsing chain can map 1:1 to the number of levels of data that are being descended. I’ve shared my version on GitHub.

Conclusion

One of the primary advantages heralded by Kotlin is concision. Yes, that’s actually a word and it means what it sounds like. Any programmer should be skeptical about such claims as programming has been going on for a long time and there’s only so much more that can be squeezed out. However, in this case, I’m inclined to be impressed.

Compared to alternative means of parsing an arbitrary nested data structure, what I was able to achieve with Kotlin has sold me on its promise to be concise. That translates into programming efficiency and enjoyment. Therefore, I see great potential in applying this compiled, statically-typed language to my one-off data pre-processing needs that would have typically been served by more dynamic scripting environments.

I’ve shown there is a bit of overhead in terms of setting up Gradle but it is similar in comparison to some of the alternatives I’m familiar with in other ecosystems such as

  • CMake (C/C++)
  • Npm/Yarn (JavaScript)
  • Carthage/CocoaPods (iOS/macOS)
  • Virtualenv/Pip (Python)
  • NuGet (.NET)
  • Go modules (golang)

It gives access to everything in the Java world and dependency handling feels precise and clean. Overall, it seems capable of addressing the needs of projects far more complex than this one.

In summary, Kotlin plus Gradle seems to hit a sweet spot (especially w.r.t. improving Java) among some of the other cross-platform, compiled options such as Xamarin (C#/F#) and Swift. And it just may beat out the scripting languages (JavaScript/Python) as I’ve learned by venturing into this previously uncharted, by me, territory. At this point, that seems like a better bet than UBER.



blog comments powered by Disqus