Using package to create dashboard in R

I have taken DSO-510 Business Analytics this semester. In the final phase of presentation, Prof.Selby asked us to create a webpage to show our results made in SAS. Actually this is a very challenging task since everybody knows  SAS is not good at fancy visualization. Without hesitating for long, I decided to use R instead.

However, R is also not good at creating graphs, at least, for basic users. There are tons of settings and configurations you can tweak, but who on earth could get the hang of them right away. I tried to ask some seniors for advice, and I had some suggestions like using Shinyapps, which is a very good javascript-embedded tool made by RStudio. Although I like challenges, I hope I can make things easier this time without focusing too much on how to use the tools and how to reinvent the wheels.

After researching some packages on the Internet, I came across which is an incredibly easy and fancy package to create plots. They are so easy to install that you just need only one line of command in R and all will be set. It contains a lot of presets in color and design, and its official website has guidelines and tutorials and even example codes to keep users on the same page. I believe everyone who has experience in R will find it easy to learn. Most importantly, can save interactive graphs into HTML file, the feature that is perfect for those who want to create dashboards.

When I get enough interactive graphs, I started to think about how to make a dashboard. Fortunately I have seen some amazing dashboards in various presentations during this semester, so I think the primary factors are “overwhelming” and “informative”. To create an “overwhelming” dashboard, I need to arrange the positions and size of the graphs very well to create a feeling that this work is simply fabulous. To create an “informative” graph, the charts and plots need to be interactive, colorful and user friendly, and luckily has done these for me. Well, apparently, now is all about design.

1…2…3…Boom! Here comes the shitty work: R Dashboard

I am not professional in design, but I don’t like to ask real designers any questions, because I believe what interests me will be good design. To achieve that goal, I have taken a look at successful webpages about dashboards, and wrote and edit the code to mimic successful websites. If anyone is interested in how to arrange the plots, please take a look in my source code of the webpage, and I believe that should be the way to learn and get into discussion. Happy learning!

Final Project of Manegerial Statistics




This report is the final project for course GSBA-524 in Fall 2015.

Olympics, the biggest sports event around the world, is held by a select country every four years. There are approximately 200 countries that take part in it each time. Obviously, a lot of talented athletics are fighting so hard to get the entry of Olympics, and, probably to win a medal for his/her country, proudly. However, as a audience I have never thought about the mysteries and secrets of those medalists. What makes them excellent? What are the interesting facts about them?

As the final project of GSBA-524, I am asked to dig deeper in this topic as an audience and as a potential data analyst. All I have is a dirty dataset that consists of details of Olympics medalists from 2000 to 2012, Country GDP, Population, HDI and ISO codes. To be honest, I am a big basketball fans and I loved Kobe so much that I also plan to make this report a tribute to my beloved 2008 USA basketball team.
Continue reading

Scrapy, Finally….

After months of experiencing the American culture and adapting to the course level at USC, I finally got a chance to review my Python skill. Fortunately, I remember the basic operation command quite well and perhaps that’s how interests can drive a normal person to learn something deep enough.

In short, I still want to enhance my skill in grab the content on different kinds of websites. Everybody knows there are tons of packages of Python that could do that, but everybody should be careful about the learning curve of those stuff unless you major in computer science. During this semester, I have heard a lot of my classmates claiming that Python is easy to learn but COME ON, Python after all is a very powerful objected oriented programming language. If you have never touched on programming before, it would make you crazy. To sum up using an analogy, it’s like learning to play the guitar, at first it might be easy but if you want to do a SOLO in BLUES at stage in front of thousands of audience, you’d better invest a decent amount of time in it.

Continue reading

Select other columns when using functions in SQL

Thanks to INF 551, I finally got a chance to practice the SQL skillset that I learned on my own. In this class, the instructor is recommending us students to use MySQL, which is, not bad but it is not as familiar to me as SQLite.

SQL is straightforward in most ways, but I found a tricky problem when I want to select another column while using functions such as sum(), count() in SELECT clauses. In excel or R, I can easily specify which column to show up when specifying some conditions. However, it seems that SQL does not provide such user-friendly methods to do that.

For example, when I want to select the column next to the maximum of the sum of “sales_amount” column, I ended up inserting two “SELECT” clauses into the chunk, and used a where clause to clarify one last condition. Mind that two clauses in brackets are kind of the same except for the letter alias.

I have done some search on the Internet but I could not find a more elegant way to solve this issue. Another method is to create a new table or view in the software to serve as a substitute for clauses in brackets.

Anyone know better solutions?

Is Excel kidding me with ” ‘ ” ?

I just took my midterm exam which is such a kick ass to me. Obviously it has something to do with Excel and it is not a pleasant experience.

In one question which requires me to convert cell format from DATE to Day of the Week. And I fully understand the formula that is “=TEXT(A2, “dddd”)”. However, because I have been busy learning programming languages myself this semester, I wrongly assume that the ” ‘ ” would be as same as ” ” “, which is  a quite weird expression, isn’t it?

But no way Excel’s formula is still using outdated rules for programming. How is that possible that it does not support single quote. It took me about half an hour and still unaware of the problem. Only after the exam is submitted did I realize that this could be the problem.

So be careful next time you use the formatting string in Excel. DO NOT USE SINGLE QUOTE!

Hands on the Tkinter module but a sad story….

Actually these days I’ve been learning the Tkinter module on Python quite hard because I found it quite interesting. It served as a tool to make my teenage dreams come true since I would like to make something visually enjoyable for a long time.

To be honest, the learning process of Tkinter is pretty complex due to its huge volume of pre-requisite knowledge of Tk programming “protocol”. I just touched on it for 3 days but I want to create some programs in order to preserve the confidence and interests to continue learning. So, I want to share a sad but true story using Tkinter module. Here’s the code:

Please don’t laugh at me, because I want to keep writing….(T_T)

Come on Python! Just Read My Clipboard

I don’t know whether you guys have been annoyed by the MyStatLab powered by Pearson. These days I was busy doing homework on it. My homework is about statistics analysis and it requires to use a lot of work on RStudio. However, as everybody know, R doesn’t support xls format very well, which requires users to transform the xls file into csv file first.

It might not seem to be a big deal as long as you have the csv file that is quite ubiquitous among data world. Nevertheless, MyStatLab doesn’t provide the csv format and give users the access of xls format instead! It gives the user to options: one is to copy the datasheet to the clipboard; the other is to download it as an Excel file.

This was totally nightmare when I tried to do my work on MyStatLab. Since I want to avoid uncertainties of reading a txt file in RStudio(which means I have to copy and paste the data to a txt file and you know we can’t simply modify the “.txt” to “.csv” because it won’t work!), I have to download the Excel file, open it in Excel, save it as a csv file, read and open it in RStudio. Those are all repetitive work and quite time consuming and not pleasing.

Screen Shot 2015-09-14 at 10.28.18 PM

After finishing my job painfully and stupidly, I calmed down and tried to solve it in Python. I hate all repetitive work and I believe machine should have done it better. Therefore, I wrote a small program to read my clipboard and save it as a csv file.

P.S. I did consider to read the excel file and run a python program save it as a csv file but I don’t see the boost in efficiency comparing to copy the data and run the python scripts. Another reason to do that is because creating a csv format file is supported originally by python.

Here are the codes:

I refer to and it is quite useful and block the Tkinter window to pop up automatically. Also, I added the support for UTF-8 encoding because I am afraid that I might use it to convert some csv files where headers would contain some Chinese characters.

Something I like about SAS

I’ve been active in learning SAS for about a week. During this time, I found out something about SAS that really make it on top of statistical software for a long time. I want to talk about my feeling about it here, as a reminder.

First of all, I want to ask a question: what softwares would you think of when you heard other people talk about statistics? Answers may vary: R, Python, SAS, STATA, SPSS……. To be honest, I have never touched on one of them and that is SAS. Due to my major in undergraduate study,  I used STATA quite a lot and even my whole paper counted on it. Additionally, I have tried to broaden my vision about statistical software, so I have installed and played with R Studio and SPSS before. Except for the various capabilities of R, I am not stunned by any kind of statistical softwares. At least that’s my initial impression. I love python very much, but I have not much experience in the statistics-related packages such as Numpy.

Until I am forced to learn SAS can I find out the most interesting point in SAS — it separates the data input procedure and data analysis procedure with DATA and PROC command! I appreciate this setting very much because when the command file is becoming chunkier, this setting would help you recognise what you want to navigate to more efficiently. At first, it has some learning curve but once you get the hand of that, you will treasure the clarity it gives you.

When I wrote my undergraduate paper using STATA, I thought of it as a python-based statistical software because both of them have similar coding logic. It is simple but is still not easy for a non-programmer to understand at once. As for R, it simplifies a lot of the coding command and makes it easier for statistician to code, but it has way harder learning process than SAS does because it is more versatile and contains a lot of unnecessary packages for researcher. It’s true that people have different preference on their tastes, so that’s also why there are many similar but different tools in one area.

Frankly speaking, the coding structure of SAS is not as efficient as that of R, for example, I have to type “RUN;” to make it run each time. What I like about is the designing of its logic that separates the data input and analysing process. Just think about it, what if Python could open every type of data files using “DATA” command, it would be of great convenience!

Generating unbreakable password ?

Last day when I was busy learning the SAS and R in the workshop provided by MSBA program in USC, I thought a lot about my next python program. All the program I wrote are driven by my interest in python, so until I found something interesting again, I would hardly write something notable.

I came across a post on, which inspired this short program. Unfortunately it was written in Chinese and the main idea is about the security of your password. To sum up, there are a lot of encryption rules and most of them are related to MD5(Message-Digest Algorithm), which generates hash value. However, due to the fast-growing technology, MD5-generated code could also be hacked using super computers. Many others would recommend a newer algorithm but in my opinion the hacking is a matter of time.

Therefore, what I am going to do is to enhance the difficulty of password hacking. I am going to give the password a random shuffle and a for loop to increase the complexity and time for breaking. Here is my simple idea:

During this example I found out that Sublime Text 3 surprisingly did not support “raw_input” function because it could not pop out a window for users to input the strings. It makes ST3 so annoying that I have to have terminal active all the time.

Overall, I don’t think there’s any kind of password that is unbreakable so security should be well considered every time you store it at some place. What’s more, we need to increase the time and cost for hacking, and also lower  the value of breaking our password. Protect yourself on the Internet!

Jikexueyuan Spider is one of my favorite websites to study coding on my own. One of its biggest advantages is the lessons are relatively short compared to long and boring lessons on some websites. However, the UI of has drawbacks in showing learner the connection between different classes. For example, I want to study HTML and CSS, but I cannot filter by lecturers or by time added, so that it is extremely hard to find out the correct sequence after this class(time added is only available in the course’s main page, not on the index page). What I could ONLY do is to select a specific category such as HTML, and face a number of unordered courses, making me frustrated which one to go after this course.

Therefore, I would like to design an application to retrieve all the related courses under a specific category, and automatically click the URL of each course in the background, and then retrieve the name and time added of each course. Finally, I want it to sort the course by time added and write the information down on a .txt file so that I can analyze conveniently. Here are the codes: