After reading many blogs about retrospect of 2017, I realized that I should to do it too. it looks necessary and helpful.

In the middle of 2016, I resigned from an international IT company focusing on desktop software, which I thought is out of date. Therefore, I joined one rising Internet company, and I am still working for it now.

Internet is amazing and terrific, so does Internet firm. But it took me a little time to find the area I would like to devote myself to, Big Data. Hadoop has made an appearance when I was pursuing my Master’s Degree, whereas I didn’t study further. Glad to have an opportunity to pick it up and I am very optimistic about Big Data’s future.

In 2017, My focus spot lies on Big Data, no matter work or learning.


The department I work for has hundreds of servers, and the job of my team is collecting the logs from these machines and analyzing some statistics. I am fortunate and responsible for the later.

The size of each day’s log is TB level. It is very challenging. The technology stack includes Flume, Kafka, Postgres, Redis, Spark, HBase, HDFS etc. What interests me most is,

  • Spark
  • HBase
  • Elasticsearch

The amount of logs is huge, so we turn to Spark for help. As the official said, Spark is a fast and general engine for big data processing. Billions of logs collected by Flume are thrown into Spark.

The result exported from Spark was stored in HBase, which is very scalable and highly available. HBase read and write performance are significantly satisfactory. However, you cannot search HBase with sql or other DSL, and for our business requirement, it’s not uncommon to execute complicated query.

It’s time to introduce an alternative, Elasticsearch. We simply dump raw log into Elasticsearch and use Kibana to discover something useful.

The first half of 2017, I dealt with Spark and HBase, the self half with Elasticsearch. From newbie to slightly experienced, but far from proficient.

During my spare time of this year, I jumped into books and projects about Big Data.


Many thanks to three books, which guide me back to Big Data,

The other recreational books I read, but help me a lot list here,

This year, a good habit was developed, which benefits enormously, writing blogs in English. I got the chance to organize my thoughts and learn, it’s really a valuable tool.


In 2018, I want to be more experienced at Big Data, so the key points need to be strengthened,

  • Scala. Both Object-Oriented and Functional Programing using Scala. Knowledge about JVM is another must-have.
  • Python. Fluent, Fluent, Fluent. Python for AI is a big plus.
  • Distributed System. Especially Distributed Storage. More theories, more source code reading.

Last but not the least, English. More listening, more speaking, more reading, more writing. I must get a thorough grasp of this language, no matter my future is in China or not.