When I started to learn Spark back in 2016, Scala is the best option to write Spark programs due to its simplicity and quickness. Back then PySpark was also available to Python users, but in order to use it, people have to initiate a SparkContext at the very beginning, and then go through some tedious steps to set things up.
As time reaches to 2018, I find that Spark community has officially created a new way for Pythonistas to interact with Spark core. The savior in this context is called SparkSession, which has bundled up a lot of things that normal users may not care at all. With SparkSession, Python users can dive right deep into data exploration, just as what they like to do in terms of machine learning. Though Scala still owns the crown in writing Spark programs, PySpark is now catching up.
Similarly, when I reviewed my knowledge in TensorFlow, I also found that Google has provided two new high-level APIs apart from the original low-level APIs. This has greatly lower the cost in time and effort for Python users to get their hands on this great open-source deep learning framework.
Java and other low-level programming languages are probably still the best overall for those who care about performance. I didn’t expect that Python would gain such huge popularity if its community is not active in machine learning area, thanks to third party packages such as scikit-learn, pandas, numpy and others. There’s a good saying about this – when one rides with the wind, even a pig can fly.