Byeong-Hak Choe

Assistant Professor of Data Analytics and Economics
South Hall 227B
585-245-5367
bchoe@geneseo.edu

Academic Website:

Image
Portrait of Byeong-Hak Choe

Classes

  • DANL 210: Data Preparation & Management

    This course aims to provide overview of how one can manipulate, process, clean, and crunch datasets with hands-on and practical case studies that show you how to solve a broad set of data analysis problems effectively. We will use a computing development environment, Jupyter Notebook, which is a shell and notebook for exploratory computing. This course will cover topics such as (1) loading, cleaning, transforming, merging, and reshaping data, (2) creating informative visualizations with Matplotlib, (3) dataset slicing, dicing, and summarizing, and (4) analyzing and manipulating regular and irregular time series data. We will cover these topics to solve real-world data analysis problems with thorough, detailed examples. Computing is done in Python (the de facto programming language in data analytics) using Pandas (the practical, modern data science tools in Python) in addition to Numpy and Matplotlib.

  • DANL 310: DataVisualization&Presentation

    This course covers tools and methodologies that visually represent data using well-presented and visually appealing graphics in order to understand data better and perform useful data analytics tasks. Topics covered in this course include (1) visualizing many forms of graphs such as line graphs, scatter plots, bar charts, and more, (2) loading data from various sources for data visualization, (3) customizing graphics using various formats and styles including colors, fonts, lines, and more, (4) visualizing geographical data such as maps, and (5) showing an overview using dashboard and telling a story using storyboards.

  • DANL 320: Big Data Analytics

    This course teaches you how to analyze big data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, and many more are all using Spark to solve their big data problems! With the Spark 3.0 DataFrame framework, it can perform up to 100x faster than Hadoop MapReduce. This course will review the basics in Python, continuing on to learning how to use Spark DataFrame API with the latest Spark 3.0 syntax! In addition, you will learn how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib). You will also gain hands-on experience using PySpark within the Jupyter Notebook environment. This course also covers the latest Spark technologies, like Spark SQL, Spark Streaming, and advanced data analytics modeling methodologies.

  • MGMT 095: Excel Tutorial