Welcome back. This is the first lecture of the second week. This week, we'll discuss four major disciplines in spatial data science and applications. In the first week, we started on the definition of spatial data science and later, we'll discuss the unique aspects of spatial data science. To the question, why is spatial special? One of the answers was, spatial is not simply GIS, DBMS, data analytics, or big data problem. It is rather the combination of all the disciplines. I would like to suggest four disciplines to have some expertise on, in order to claim a role of spatial data scientist. They are, as you can see, GIS, Spatial Data Base Management System, Spatial Data Analytics, and Big Data System. Definitely, application domain knowledge will be a big plus, such as business, urban planning, transportation science, public health, disaster management, and any domain where spatial data are intensively used. Now, let us briefly review each discipline one by one. GIS. GIS stands for Geographic Information System. It can be defined as a system which collects, stores and manages, queries, analyzes, visualizes spatial data. In a wider definition, a system which combines hardware, software, data, people, organization, and policy regarding spatial data. GIS can be considered as the limited version of CAD for spatial data production, database management system for data management and a data analysis tool. The most powerful aspect with GIS is that, it is a geo-visualization tool which is specifically designed for cartographic representations of spatial data. In other words, map-making is well-supported. There are quite a few GIS softwares in commercial market, as well as in open source software domain, ArcGIS is undisputedly the leading GIS software. Hexagon or Intergraph GeoMedia, General Electrics Smallworld, and Pitney Bow MapInfo. They are the four major commercial GIS softwares. On the other hand, Quantum GIS, also known as QGIS is the most popular open-source GIS software. Advantages of using GIS is that, it can cover to a certain degree, most all the processes of spatial data science and applications - spatial data production, spatial data management, basic query, geo-processing, some advanced analysis and geo-visualization. It can do everything. On the other hand, GIS has a limitation that it can handle almost all the aspects of spatial data science only with limited capability. It has DBMS functionality, but it does not support standard query language and generally designed only for single user. Meaning that GIS does not support some concurrency control, some analytical power but it is not fully fledged. Furthermore, generally no big data functionality is presented. Spatial Data Base Management System. What is Spatial DBMS? Spatial Database Management System is a database which can store and query, not only conventional data set of number and text, but also user-defined data types of spatial data, such as point and polygon of vector objects and images of raster object. It became feasible after object relational DBMS was developed. Spatial DBMS softwares are a basically object relational DBMS with the spatial data of built-in data types and spatial query language. Examples include Oracle Spatial - an extension of market leading DBMS Oracle, PostGIS which is an extension of postgres SQL and open-source software and Microsoft SQL. Some limited version of Spatial DBMS are SpatialLite and ArcGIS. Spatial DBMS inherited all the DBMS functionality. So, removal of redundant and inconsistent data, concurrency control, backup and recovery are supported. In other words, characteristics of ACID which is stands for - Atomicity, Consistency, Isolation, and Durability are given and supported. In addition to that, as the name goes, it has spatial data type and spatial operations and spatial query language. Spatial data indexing and spatial query optimization are supported as well. However, it also has some limitations. Spatial DBMS has only limited operations supported in query language, so that it cannot deal with high-end modeling and analytics of spatial data. A visualization is also big limitation. At the same time, just like any other DBMS, Spatial DBMS has weakness of big data management which is based on a different paradigm. Now, data analytics. It can be defined as a science of examining and processing raw data with the goal of finding useful information. Generally, it takes a series of processes such as cleansing, selecting, transforming, modeling, and summarizing data. There are many data analytic tools for domain specific or generic purpose. There are variety of data analytic tools of commercial or open-source, R is the most renowned data analytic tool in open-source community which has its own ecosystem in which any developer can build his or her own package for spatial data analytics. There are also quite a few packages based on R, and KNIME and Rapidminer are also popular open-source solutions. In commercial arena, SAS or SPSS is a very powerful and popular business data analytic tool. If you want more freedom in data analytics, then you can also consider Direct programming languages with many data analysis libraries, such as Python and Java, or computing environments such as Matlab as well. Data analytic tools are generally easy to use. It has big advantages with intuitive graphic user interface and simple script language. They present a variety of built in generic data analytic functions and connection capability with other systems such as GIS, Spatial DBMS and Big Data Systems. And another good thing is that, there are quite a few excellent tools in open-source community. Unfortunately, among those data analytic tools, nothing really designed only for spatial data analysis, except for tools within GIS softwares. In generic tools, for example, R has some packages for spatial data analytics. Data analytic tools certainly lack in spatial data management, also some visualization capability presented, but very limited for geo-visualization, such as cartographic mapping. Now, Big Data Systems. In the past, high-performance computing generally meant distributed systems of task parallelism. Meaning that competition jobs are distributed to each node. In the era of big data, data distribution, as well as computation job distributions are required. The concept and a programming model is called the MapReduce. Hadoop is an open-source software with file system named Hadoop Distributed File System (HDFS), which is an implementation of programming model of Map-Reduce. Hadoop has only limited capability of supporting distributed storage and computation, and even no support for accessing or updating individual data. HBase is a non-relational, distributed database to add capability of read and write operations of Hadoop framework. Hive gives an SQL-like interface for providing data summary, query, and basic analysis. YARN, which became available app to Hadoop 2.0, which is stands for Yet Another Resource Negotiator, which is Hadoop cluster resource management tool with scheduling and allocating computing resources. Likewise, additional needs brought in new solutions to Hadoop ecosystem and it keeps growing and changing. Hadoop framework presents capability of big data management analytics which used to be impossible with existing technology. for big data processing, it can present both scalability and resilience for data management. Flexibility and speed for data analytics. Additionally, they are all open-source solutions. They are all big advantages of using big data system. As disadvantage of big data systems, the most critical issue of big data system for us is that, it's not designed for spatial data. Hadoop is a shared-nothing structure, so that is good for only independent computing with BIG data. Any advanced spatial analysis can hardly implemented in Hadoop framework. Certainly, no capability of spatial and geo-visualization. So far, we have briefly reviewed the four different disciplines to understand and in order to claim a role of spatial data scientist. They are GIS, spatial DBMS, data analytics, and big data system. It is obvious that, each discipline has pros and cons, advantages and disadvantages, and they need each other. For that reason, an integrated framework of four systems are required. The figure on the side is an integrated framework of four disciplines. in which each discipline can be connected and communicate each other with data flow and only utilize the most powerful aspect of individual discipline. For example, geo-visualization with GIS advanced analysis with data analytics tool. Throughout this course, you will learn each discipline from basic to advanced. Alright, this is the end of this lecture. Thank you for your attention and see you in the next lecture.