Software Installation and Environment Configuration

This section describes the installation and configuration of the experimental environment used in this book.

The purpose of this book is to introduce the use of Python for data processing, not involving how to use general software.However, in order to view GIS data, you need to use desktop GIS software. It is recommended to use QGIS ( http://qgis.org ). Of course, if you have other desktop GIS software, such as ArcGIS, you can also use it.

Open source GIS introduced in this book

The open source GIS software and programs are numerous, and the system is huge. This book cannot explain it one by one. Even the subject of this book (Python and open source GIS) can only choose software and class libraries that are currently mature and widely used. This book focuses on how to use Python as a programming language for data processing. The following mainly use some of the class libraries.

  • GDAL/OGR for reading and writing of raster and vector data;

  • Proj.4 for map projection processing;

  • Shapely, for spatial analysis of data;

  • SpatiaLite is a small spatial database.

  • Mapnik is used for cartography.

  • Basemap is another set of cartographic tools.

  • Other libraries, including pyshp, geojson, Descarts, GeoPandas, Folium, will also be introduced briefly in this book.

Installation and Configuration of Debian Linux

Perhaps the first problem in learning to use open source GIS is the use of operating systems. A large number of open source software are developed under GNU/Linux platform. Although many of them have been transplanted to Windows system, many of them are still difficult to install. In the field of GIS, there will be more problems.

In order to make better use of open source GIS tools, this book recommends using Debian/Ubuntu Linux operating system for learning. Debian can use Debian Stretch (Debian 9) released in 2017, and Ubuntu can use Ubuntu Bionic Beaver (Ubuntu 18.04) released in 2018. There is little difference between the two operating system versions, and there is no big difference in operation. All the code in this book has run successfully in both systems. But there are still some differences between the versions of the class libraries in the two systems, especially some newer ones. The results of code running are inconsistent. This book uses Debian Stretch in the writing process, using the Linux kernel 4.9. The operation and instructions in the book are also based on this system.

Debian is a voluntary organization dedicated to free software development and advocating the concept of the Free Software Foundation. Debian was founded in 1993 when Ian Murdock sent an open letter inviting software developers to participate in building a complete software distribution based on the newer Linux kernel. Over the years, this group of fans funded by the Free Software Foundation and influenced by the GNU concept has evolved into an organization with more than 1,000 Debian developers.

Check out the latest version of Debian 9:

$ uname -a
Linux v 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux

Installing related components and class libraries

Debian is the master version of Linux distributions such as Ubuntu, Linux Mint and Elementary OS. It has a robust package management system. Every component and application of Debian is built into the package installed in the system. Debian uses a set of tools called Advanced Packaging Tool (APT) to manage this package system.

In Debian-based Linux distributions, there are various tools that can interact with APT to facilitate user installation, deletion and management of software packages. Apt-get is one of the most popular command-line tools, and aptitude is a command-line gadget that combines GUI with aptitude.

Apt command is a newer command. The first stable version of the apt command was released in 2014, and the Ubuntu 16.04 system was released in 2016. The introduction of apt commands is to solve the problem of over-decentralization of commands. It includes the most widely used functional options since the emergence of apt-get commands, as well as functions rarely used in apt-cache and apt-config commands.

At present, Linux distributors are recommending apt command line tools. The apt command option is less memorable and therefore easier to use, and more importantly it provides the necessary options for Linux package management. Apt-get is not abandoned, but as an ordinary user, apt should be used first.

Python is widely used at present. The newer Linux distributions all have Python programs. All the functions described in this book were tested successfully in Python 3. Using Python 2 doesn’t make much difference.

Viewing the versions of software, class libraries, and their dependencies, you can use the aptcommand, which has different options:

apt show gdal-bin
apt depends gdal-bin

See the figure for component dependencies between class libraries.

开源GIS类库依赖关系

Open Source GIS Class Library Dependency

In Debian system, install the tools introduced in this book, you can use the following commands:

apt install python3 python3-gdal gdal-bin \
    python3-pyproj proj-bin python3-shapely \
    fiona python3-fiona \
    python3-mapnik libspatialite7 \
    libsqlite3-mod-spatialite spatialite-bin \
    python3-mpltoolkits.basemap \
    python3-geopandas  python3-nose \
    python3-pygraphviz

In addition to the software tools already packaged in Debian, there are some Python tools that cannot be installed through the apt command. Python provides its own package installation tool, pip. Pip is a modern and universal Python package management tool, which provides the functions of finding, downloading, installing and uninstalling Python packages. It is currently a package installation tool recommended by PyPA (Python Packaging Authority). In addition to pip, software management tools in Python include easy install, setuptools, and distribute.

Use of Virtual Machines

Most readers may not have used Linux, let alone have their own experimental environment. But now that computer technology is so developed, this problem can easily be solved by technology, without having to reinvest in new equipment.

The following will introduce virtualization technology and virtual machine software. These technologies are not only used in the study of books, but also helpful in real life and development.

Virtualization Technology and Virtual Machine Software

Virtualization is a process of creating software-based (or virtual) representations for components such as virtual applications, servers, storage and networks. It is an effective way to reduce IT overhead and improve efficiency and agility for enterprises of all sizes. Virtualized computer components operate on a virtual basis rather than on real hardware. Virtualization technology can expand the capacity of hardware and simplify the process of software reconfiguration. CPU virtualization technology can simulate multi-CPU parallelism on a single CPU, allow a platform to run multiple operating systems at the same time, and applications can run in independent space without affecting each other, thus improving the efficiency of computer use.

Virtual Machine (Virtual Machine) refers to a complete computer system with complete hardware functions simulated by software and running in a completely isolated environment. Virtual machines need to simulate the underlying hardware instructions, so the application runs slower. Virtual machine software is software that can provide virtual machine functions for different operating systems, including open source VirtualBox and commercial VMWare Player (free). Microsoft has provided Hyper-V client software to support virtualization since Windows 8, which can virtualize and run Windows and other operating systems on the same host. In Windows 10, Microsoft has further introduced “Windows subsystem for Linux”, which better integrates Linux system (currently supporting five Linux versions including Ubuntu and Debian) and Windows system. Together, it allows you to run any Linux commands in geometry.

Introduction to VirtualBox

Readers can choose a variety of virtual machine software for experimentation. This book recommends the use of open source VirtualBox.

VirtualBox is an open source virtual machine software. VirtualBox is a software developed by Innotek, Germany, and produced by Sun Microsystems. It is written in Qt and formally renamed Oracle VM VirtualBox after Sun was acquired by Oracle. Innotek releases VirtualBox with GNU General Public License (GPL) and provides binary and OSE versions of the code. Users can install and execute Solaris, Windows, DOS, Linux, OS/2 Warp, BSD and other systems as client operating systems on VirtualBox. Now it is developed by Oracle, which is part of Oracle’s xVM virtualization platform technology.

VirtualBox is known as the strongest free virtual machine software. It not only has rich features, but also has excellent performance. It is easy to use, and virtual systems include Windows (from Windows 3.1 to Windows 10, Windows Server 2012, all Windows systems support), Mac OS X, Linux, OpenBSD, Solaris, IBM OS2 and even Android operating systems. There are two versions of Virturalbox, one is free and non-open source Virtualbox, the other is open source version, called Virtualbox-ose. Unless you are studying the development of virtual machine software itself, you can generally use non-open source versions.

Editor and IDE

Writing code for development requires an interactive environment with the computer. These environments are mainly divided into editors and IDEs, there are many choices, here according to the author used some brief introduction.

Editor is a tool to assist data input to computer. General code editor is further able to support code highlighting, automatic indentation, automatic completion and other functions. There are also code editors that can execute and debug code, as well as support interaction with version control software. Editors are generally not limited to a particular language, but more general.

IDE (Integrated Development Environment) is an application program for the provider development environment. Besides the function of code editor, it integrates the common tools used in the development of code analysis, compilation, debugging, version control tools and so on. IDE is usually used for a particular language.

Code editors are lighter and faster than IDEs, but there are far fewer built-in tools. It’s more convenient to use a code editor when writing simple lines of code; but when the code becomes more and more complex, choosing an IDE can significantly improve the coding efficiency. In addition, it is also necessary to master one or more general code editors because of the convenience of installing code editors and the fact that IDEs may not be available due to environmental constraints.

Using Python language for development, you can use a general editor, or IDE (Integrated Development Environment), according to the situation of free choice. Python itself also provides IDLE (integrated development environment) integrated development environment for use.

Under Windows, you can choose free NotePad ++, Atom, Visual Studio Code, GVim (Graphical Vim), or Sublime 3; under Linux, you can use the Linux version of these software (if any), as well as VIM, GEdit and so on. Vim / GVim has a slightly longer learning cycle and takes time to get used to the way it is used, but it is more common in Linux environments.

On the IDE side, the most common open source tool is Eclipse, which can be used for Python development in conjunction with PyDev plug-ins, and commercial PyCharm. Both tools are developed in Java and can be used across platforms.

PyCharm is an integrated development environment developed by JetBrains, which is now widely welcomed by Python developers. Here is a special recommendation. It has a set of functions and tools to improve efficiency when developing in Python language, including code debugging, grammar highlighting, project management, code jumping, intelligent prompts, automatic completion, unit testing, version control, etc.

If Web development is not involved, the PyCharm Community version is free and can be downloaded from the official JetBrains website for installation (Java runtime environment support is required). In addition, for educators, or in maintaining open source projects (active projects for more than one year), you can apply for a free license to use the professional version. The PyCharm Professional Edition provides more features, such as installing the plugin to support the Latex language. Text editing and code debugging in the middle and late stages of this book is done in such an environment.

The authorization of PyCharm Professional Edition is applied through the open source Web CMS framework TorCMS