Python is reigning!

When I started to learn Spark back in 2016, Scala is the best option to write Spark programs due to its simplicity and quickness. Back then PySpark was also available to Python users, but in order to use it, people have to initiate a SparkContext at the very beginning, and then go through some tedious steps to set things up.

As time reaches to 2018, I find that Spark community has officially created a new way for Pythonistas to interact with Spark core. The savior in this context is called SparkSession, which has bundled up a lot of things that normal users may not care at all. With SparkSession, Python users can dive right deep into data exploration, just as what they like to do in terms of machine learning. Though Scala still owns the crown in writing Spark programs, PySpark is now catching up.

Similarly, when I reviewed my knowledge in TensorFlow, I also found that Google has provided two new high-level APIs apart from the original low-level APIs. This has greatly lower the cost in time and effort for Python users to get their hands on this great open-source deep learning framework.

Java and other low-level programming languages are probably still the best overall for those who care about performance. I didn’t expect that Python would gain such huge popularity if its community is not active in machine learning area, thanks to third party packages such as scikit-learn, pandas, numpy and others. There’s a good saying about this – when one rides with the wind, even a pig can fly.

Next Big Area for Data Analysis

Artificial Intelligence? No, since it is already very popular and everybody wants some out of it.

Block chain technology? Nope, since it is more of a security thing, and data analyst is not playing a main role in its development.

Then what are we talking about here? I want to define the “Big Area” in the title as something that we could easily tap into with our current techniques, and it has not gone viral globally yet. The area I’d like to share my insights on, is electronic gaming.

Yep people might say that sports analytics has already been maturing especially in developed countries, and top players have dedicated data analyst to improve their performance. However, they could afford them partially due to the global recognition of their type of sport, and esport hasn’t been at the stage yet – it is still struggling to make itself enter the vision for most people. That is to say, the data analytic part in esport is very much related to the industry itself, and I would say it has a prosperous future.

Additionally, esport is so perfect for data analysis that I couldn’t think of another sport that has more appropriate case for data analysis. As of now, we have adapted high speed computers and machines to do complex calculation for us, and at the same time esport is played on computers. In other words, we’ll be able to analyze anything you would care about in games in order to improve the performance, because all data is available theoretically.

Esport’s data analysis has inherited the advantages from traditional sports, for example, you can’t be a good analyst without the experience or knowledge of that specific sport. It has also some merits that traditional sports can’t achieve, such as the low difficulty of getting observable data. As the esport industry continues to grow with more competitive games such as Dota2, League of Legends and Overwatch emerging, I believe it has quite a bright future for those brave pioneers.

Raspberry Pi 3, OMG

I’ve never imagined there would be such a revolutionary product, such a tiny little computer which is even smaller than the floppy disk, come into the world when I am only in my 20s. I knew its existence probably 3 or 4 years ago, but I have no interest in buying one until recently, when I realized I might need a 24/7 computer that can run linux.

With $35 in 2017, you could buy a Raspberry Pi 3 with WiFi and bluetooth capabilities, 4 USB ports, 1 RJ45 port, 1 HDMI port and an old fashioned 3.5mm headphone jack. Its IO system is so well-rounded. It doesn’t have a good CPU nor GPU, and its RAM is quite limited by 1 Gb, but who needs that much, really?

I bought one because like I mentioned at the beginning, I need a linux machine that can run 24/7 to serve as a gateway of my local network. I learned this technique accidentally when I was browsing the method to improve the connection of game consoles. Yeah, you are right, my purpose to do all this is simply trying to enable me to play online games LoL. I have owned a Nintendo Switch for quite a long time now, and since I came back to China the Internet condition here is quite annoying, with strict NAT type everywhere and usual lack of a public IP. All those factors and complexities have made me unable to connect to other players on my Switch games.

I then turned my research on those game proxies that claimed to be able to speed up my connection to outside world. Basically they are just proxies or VPNs that can redirect UDP traffic so as to give me a public IP. On PCs you could easily do it by installing some software that can do it, but it is hard to do it for consoles since you can’t install software on them. Some people decide to pay extra money to get a smart router which is basically a router with a smarter system on which you can install some “software”, but in essence they are linux-like systems. This fact makes me wonder if I could just use a local machine to serve as a router, and I soon happened to find out there’s a technique called transparent gateway, which is just another machine in the local network that can serve as a gateway to redirect all Internet traffic to a outside server, and what I need to do is simply changing some IP settings in my Switch’s Internet setting.

I have some experience in Linux, partially on my Mac, but mainly on my VPS. However, this time I am dealing with Debian on the Raspberry Pi instead of CentOS on my VPS, and it has some different properties, but not too much. The process is hard, and due to “fear” I don’t want to talk too detailed in this post. I simply install a “software” that can redirect my traffic to a server outside that has a public IP. Firstly, the Raspberry Pi should be able to redirect both TCP and UDP packets, and this could be done by setting up some rules in iptables. Secondly, the server in the outside world, should be able to handle UDP relay as well, and this was a mistake I made during the process. So at the end it is very simple, one step locally and one remotely, but the trials and errors along the way could be daunting to a lot of people. Luckily this is not my case, because I want to solve this problem so badly that I spent hours after work until I can’t keep my eyes open. (shameful problem-solver personality)

Now that there’s a 24/7 smart “router” in my house, I feel like there’re more opportunities for a smarter house. I remember I brought an Apple TV 4 back from the states, but couldn’t make it work because back then I didn’t have a machine that can do traffic relay, so my Apple TV is terribly crippled by the Internet in China. Proudly I am going to make it back to work when I am home in the upcoming Spring Festival, with my cutie Raspberry Pi, and believe it or not, it only costs me $35.

Please Hold On To Net Neutrality, America

It might sound weird to hear from a Chinese guy shouting out for American issues at first, but if you understand the current circumstances of Chinese Internet condition, or if you have ever lived here, you’ll realize right away what I am trying to say. IT WILL BE A STEP BACK, I PROMISE YOU.

I seriously believe this is a hot tech topic in the U.S. now, but as you could even imagine, there’s nearly no coverage for this piece of news here, partially thanks to the already-gone net neutrality here. ISPs should never be granted the rights to differentiate their customers, and I’ll use examples here to tell you what is going to happen.

To start with, what you are worried about is going to happen: charging more fees for heavier users, bundling up some websites to segment the market and so on. In China there’re as many types of packages as you could imagine for company network, especially if it is a foreign company that needs more open Internet, it would be charged with more fees with customized service. And this is totally unnecessary if net neutrality is still in the play. Internet is innovated as a motivation to connect across the globe, although ISPs could arrange the resource more effectively by shutting down net neutrality, it violated the basic ethics for Americans: every one is born equal.

Secondly, if the government has the power to abolish any policies without consulting the majority of the tech society, what could happen in the future? In the future, ISPs might not only be able to bundle the service they like to bundle, but also be able to censor your data as they like. Moreover, the government might also step in and say: hey now we are in charge, so your data will be sent to U.S. government before it leaves the U.S. territory. As so basically it has the potential of granting the government too much authority on this topic, which might make a lot of us feel violated.

It reminds me of the death of Aaron Swartz, who challenged the copyright world with his programming skills and sincere motivations. It also reminds me of people in Wikileaks, Pirate Bay, Anonymous… The values and the goals they are promoting to the world is shockingly similar: Internet/Knowledge should not be only for the rich, for celebrities, or for people who have authorities.

One final question to all of you: how could Donald Trump ever become the POTUS if the poll only serves for those who are “louder”? Please don’t lose this core value behind even if you are planning to turn you back on net neutrality.

A Visualization For Score-Cutoff-Like Strategies

It’s been three straight months since I started my work at HomeCredit China, and I have to admit that companies like this big really take new employees a lot of time to understand their business model at the beginning. But thank god, time and intelligence can together help me to overcome the difficulties, so now I am starting to get the hang of it, and I am going to share some of the techniques I found out in my work that can help others, potentially.

Working in the decision-making department that controls the underwriting strategies for loan applications, quite often we would use “scores”. They could be scores calculated by the company it self, or the score from other companies like the popular “Zhima Score” from Alipay in China, or they could be a little bit of either. For example, if we were Alipay, and we are gonna say in order for us to credit you some money, you’d better have a Zhima Score that is higher than 590. One day you found out that this decision is rejecting more customers than you would hope for, you’ll naturally want to adjust the cutoff value, right? Therefore you’ll start to simulate the new cutoff value, for example, 580, to see if this new strategy is doing you any better. Simulation is also important but I’m not gonna talk about it here.

However, what I just said is just for the most simple case, where you only have one score cutoff, so that you can just rerun the simulation again and again to draw a graph where the x-axis shows you the cutoff value and the y-axis shows you the approval rate, for instance. What if, you have a lot of score cutoff strategies? You can still do the analysis one by one by rerunning the simulation by different cutoff combinations, but when the data for simulation is large this process can become crazy slow, and you’ll not be able to see their interactions at first glance.

What I am proposing here is, to use an “incremental” method for visualization instead of the “all-over-again” method. Let’s get the simulated data once, and then do the modifications on it. For instance, when you want to lower the Zhima Score cutoff, try to find out what part of customers in the dataset will be possibly getting the offer, and then assign a possibility of them being approved, such as 80% or any other value calculated. Correspondingly you can apply this modifications to all other cutoffs, and because the strategies in the decision-making system are usually run in an order, and are coded in a table with each line referring to one strategy, you could break down the specific segment of customers who might get new results, calculated by probability. It is not accurate at the first glance, but when your data is large, it could be more precise than you would’ve thought.

And the huge upside of this method is, you could visualize the interactions among several score cutoff categories AT THE SAME TIME! You can play with the modifications and see their general impact on the variable you’re interested in. In my example here you can play with the sliders for different score cutoffs, and see what they would do, how large the impact would be for the approval rate and risk performance. At the very end, when you find out the perfect combinations you want, you could simply rerun the simulation to prove this thought.

And trust me, the result won’t be too far away from the visualization, because you already know what is going to happen through the visualization.

Project link here: Github Link

Signing up the Dataquest!

Since I started to work, the things to do have mainly become getting data from database using SQL, and then throw it into Excel to do some analysis – to be honest, Excel is enough for most analysis for most companies. Although in fact my supervisor has never constrained his employees what kind of tools to use, most of my colleagues have settled down on Oracle SQL + Excel.

This is not the case for me, never. As one of the only two people who can code in my team, I started to automate some processes (especially some annoying and repetitive ones) using Python (thank god I have learnt Python myself) with another colleague who is from computer science background. It is pretty funny when he also learns the fact that I could even code in HTML and CSS, and I believe he would never underestimate a business student’s coding skills anymore. Another interesting thing about my job is that one lousy colleague always keeps asking questions about R programming language to me. One day when I tried to asked her some very basic operational questions, she was pissed off for no reasons and shouted “why don’t you check it up yourself”. After that I’ve never talked with this idiot who thinks she is the center of universe and have rejected all possible chances for conversation. Please, Chinese girls, be respectful to others, will you?

Alright, back to the topic about continued learning after graduation. Apart from my learning Japanese everyday, I sincerely believe that I need to keep learning some professional skills to make myself competitive for the next couple of years. Tensorflow is what some others are learning, and I took a look at it. It is a very promising package from Google but, it is too cumbersome for learning and coding. Not long after Keras bumped into my vision, and I think it will be a perfect package for deep learning beginners. After doing some research online, I happened to find out two amazing websites for people who want to learn more about data science: Datacamp and Dataquest.

I really want to try both at the same time, but they both require subscription to be able to access to all available courses. Datacamp is more R-focused with growing Python content, while Dataquest is more Python-focused with clear paths to be an expert in data analysis in Python. No brainer, isn’t it? As a guy who has learned R extensively in the past two years, I’ll be more than happy to have a website who specializes in teaching people Python to do data analysis, so I ended up starting my learning in Dataquest. I might be signing up to Datacamp later as well, but it depends on how fast I can go through the content on Dataquest (I am learning like flying because I want to save some money).

No matter which service I chose, my ultimate goal is to get my hands on deep learning after gaining some machine learning knowledge from USC. I believe being able to know it as well as to do it, will greatly enlarge the boundary of who I can become. I just want to be better by learning things that interest me, and I know I will.

Thoughts on Ulysses changing to subscription-based

I have been using Ulysses for quite some time and consider myself one of the first group of adopters for this application. Ulysses feels elegant, fast and effective, and has been my preferred writing app.

When Ulysses announced its change from a one-time-paid program to a subscription based app, I was not shocked at all, since in recent years 1Password, Adobe Photoshop, PyCharm and other excellent softwares have all changed from free/paid to the subscription type. It is not necessarily bad for consumers at all, if this move can help developers to continue its work for a long long time. On the other side, it does have added extra financial pressure to their clients if customers need any new feature from the app, because almost all developers will give no time to customers to think about it and release a new version directly, at the same time taking down the old guy from the store directly.

Honestly speaking, I don’t think customers who have paid such a high price for such an excellent app would be mad at the decision to switch. As one of them who might be unhappy about the move, I am more unsatisfied with their incompetence in notifying existing customers beforehand. I think, even after nearly 10 years since the concept “App Store” has been created and implemented, we are still searching for the best way for developers and customers to benefit enough from its booming development. Probably those platforms are taking out too much from the developers? Or customers are paying too less for some applications? Only time will tell.

Life is a Journey of Accumulation

Sometimes I wonder what makes me today, and there’ll be tons of alternative answers in my head. It might be due to my interest, my characteristics, the environment that I grew up in, and the people I have met in my life, or it is simply due to the destiny.

For example, I’ve been studying wine culture and Japanese recently, but I found myself constantly forgetting the new things I just memorized. I’ve never wanted to blame my seemingly decreasing learning skills, but I found that to be good at something, it takes some serious time in practice. To be accurate, I have spent quite some time seriously in learning the things I am good at today, instead of doing the things repeatedly without ever wondering how to improve it.

It comes down to a book talking about “deliberate practice” that I read in the past couple of weeks. Looking back to my journey as a tech lover, I did also encounter some time when I found myself keep forgetting new things that I just studied. Gradually, however, I started to pick them up more and more quickly, and that’s the time when I realized I had accomplished something in learning a new thing. So my point is, don’t see yourself as a slow learner, because we have never been that fast. The reason why now we are good at something, is that we have spent countless time in mastering it, whole-heartedly. We tend to forget how much efforts we have put into something, especially after we are sort of comfortable in using it.

There’re just too much great stuff to learn in the world, and I used to be too keen on doing all of them. After all these years, I’ve also learnt to enjoy myself a little bit when it is time to do so. And no matter what I want to do, good or bad, hard or easy, I’d like myself to remember the art of patience in learning, since it is better late than never. 🙂

DotA 2 – A Modern MOBA Giant

I am never a hard-core gamer before. Athough I do have ever played good RTS games like StarCraft, Red Alert 2 or Empire Age, as well as others such as the famous FPS game – Counter Strike, I have never spent any second on Multiple online battle area (MOBA) games before. I can still remember clearly the time when all of my friends were crazy about Warcraft and Defense of the Ancients, because that was the first time I knew that some games are really popular, but I didn’t try it, until recently I got my hands on the DotA 2.

First things first, DotA 2, as a MOBA illustrator, is impossible to play on mobile devices or game consoles that without a mouse, so it has only been released on PC and Mac platforms. Similarly, the well known Riot Game’s League of Legends is the same, being accessible only on computers. The reason for that is because these MOBA games are so in depth, with tons of options and settings, and requiring a lot of accurate clicks and drags throughout the game session. Typically, a DotA 2 session can go from 40 to 50-minute uninterrupted time, with a 5 V 5 competitive gameplay. Thus saying, DotA 2 is also a team play game. You’ll need teammates to cooperate together to destroy the base of the other side, and vice versa.

It is an extremely difficult game to get hands on, since its depth in gameplay. But once you are in, you are in for life. If you are a casual gamer, it might not be the one for you, but if you are a purist, you have very little room to complain about its excellence in practice of MOBA. Being such a deep and well-balanced game, it is not without drawbacks. For example, it is too “harsh” on newbies, and it is also not mobile-welcoming. The complexity it possesses is a gem but also a fallback. It also lacks in the customization in skins and items, just like what Apple does to its iOS, limiting what the gamers to do to change the looks in the game, while instead offers exclusive skins on sale to attract hard core fans.

That said, it is a great game to be learned from if you are interested in the market of MOBA games, but it is also important to keep its dissect simple to not let readers down. Analyzing DotA 2 truly could be written into an entire book with tons of details and complexity.
Continue reading

Main Criterion to Buy a Gadget/Service/Apps/…

There’re tons of things coming up everyday in this fast-growing technology world, and customers like me have tons of choices when it comes to buying any kind of stuff. Then what is my standard of it to help me avoid lots of bad things out there on the market?

Needless to say, when you are facing some choices or alternatives in a specific type of service, for example, a GTD app on your smartphone, typically on the iOS platform you’ll see OmniFocus 2, 2Do, Wunderlist and so on. The choice of mine is OmniGroup’s OmniFocus 2, an easily made decision for me. The reason why I chose this app/service is because the company and history behind this app. OmniGroup has been a excellent and consistent developer for both MacOS and iOS platform for over 10 years, and I have no reason to choose other apps when I know I can get a steady service from a more trustworthy company in my eyes. When you buy an app, you are not only buying the service now but also the service in the future, since nobody is hoping to get an app that will be taken down from the App Store or discontinued by the developer. For instance, when the app Ember from Realmac was discontinued, a lot of customers were whining about the developer without giving enough efforts to loyal customers, with a fact that this was an expensive and professional app. If people spend $60 in an app but only get 1 year of use, then it is totally unacceptable, making the developer’s other apps lose trust from customers.

To take this discussion further, for instance, what if you want to buy a car? The first thing you need to consider is the company and history behind the brand, again. Take Volkswagen as an example, the diesel car scandal truly has made some customers distrust this company anymore, but the history behind this brand has still enabled itself to attract new customers worldwide. Another example being Samsung, we all know what happened to its 2016 flagship smartphone Note 7, but somehow its strong background and good history have together helped make a comeback this year with the new S8. It seems like I am persuading people to buy "old-fashioned" stuff instead of trying new things, but actually I am not. I am, too, a guy who likes to try new things, for example, I went to grab a Nintendo Switch before I can’t:

If we also want to touch on the criterion on buying/trying new things, it is also easy to tell: Also buy those things that have overwhelming positive comments and reviews. Sadly Nintendo Switch is one of those great things, so before I hop on the flight back to China where I can’t get one as easily as I do here, I decided to grab one from Amazon last week when it was available. Okay, well, perhaps I slapped on my face regarding to what I have said in my last article, but NS is so damn good! Another good exercise for my standard lol!

Especially in the tech world, things like kickstarter or beta projects from some companies are selling concepts or ideas only to customers. Things being purchased might not be worthy of owning. If you are a tired customers like me, buy things that have been well tested and are from a well-known company. If you want to try something new, buy things that have almost one-sided positive image. Any final words…? Yes, please avoid flying with United Airways, it is just not a reliable company.