Access pixel of raster data in Python using GDAL

Raster data types in GDAL

DataType refers to the data type of the actual value in the image. The specific data type is defined in the gdalconst module.

First look at the specific usage and the results returned.

>>> from osgeo import gdalconst
>>> dir(gdalconst)
['CE_Debug', 'CE_Failure', 'CE_Fatal', 'CE_None', 'CE_Warning', ...

The beginning of those GDTs is the data type of the value. To view the data type of a band in an image, you only need to print the DataType property of this band.

>>> from osgeo import gdal
>>> rds = gdal.Open("/gdata/geotiff_file.tif")
>>> band = rds.GetRasterBand(1)
>>> band.DataType
1

The return result is integer. But what does this 1 mean?

The table lists the corresponding values of gdalconst and integers.

type

Gdalconst attribute

integer

Unknown or unspecified type

gdalconst.GDT_Unknown

0

8-bit inconsistent integers

gdalconst.GDT_Byte

1

16-bit inconsistent integers

gdalconst.GDT_UInt16

2

16 bit integer

gdalconst.GDT_Int16

3

32-bit inconsistent integers

gdalconst.GDT_UInt32

4

32-bit integer value

gdalconst.GDT_Int32

5

32-bit floating-point type

gdalconst.GDT_Float32

6

64-bit floating-point type

gdalconst.GDT_Float64

7

16-bit Complex integer

gdalconst.GDT_CInt16

8

32-bit Complex integer

gdalconst.GDT_CInt32

9

32-bit complex floating-point type

gdalconst.GDT_CFloat32

10

64-bit complex floating-point type

gdalconst.GDT_CFloat64

11

Table: Correspondence between Constant Quantity and Numeric Type in GDAL

According to the table, the value 1 indicates gdalconst.GDT_Byte. The data type here is corresponding to the type in NumPy.

Data access to data sets

The descriptive information of remote sensing images can be accessed by the methods described in the previous section, and some information such as acquisition time, processing time, spatial resolution and image size can be generally known. In order to process the remote sensing image, it is necessary to further access the data in the remote sensing image, that is, the gray value of the pixels in the image.

GDAL provides the following two functions to access image values:

  • ReadRaster() reads image data (in binary form);

  • ReadAsArray() reads image data (in the form of an array).

These are two very important functions. They can read the image data directly, and then calculate and analyze the raster data. These two functions have some parameters.

  • xoff,yoff : Specify the position (in pixels) of the origin of the full image in the entire image to be read.

  • xsize,ysize : Specifies the length and width (in pixels) of the rectangle to read a partial image.

  • buf_xsize,buf_ysize : You can zoom after reading a part of the image. Then use these two parameters to define the final width and height of the scaled image, GDAL will process the data to this size.

  • buf_type : can convert the type of data read (for example, the original data type is short, converted to byte).

  • band_list : Adapt to multi-band conditions. You can specify which band to read.

Here’s a brief introduction to how to get the data in the GeoTIFF file.

>>> from osgeo import gdal
>>> dataset = gdal.Open("/gdata/lu75c.tif")
>>> from numpy import *
>>> dataset.ReadAsArray(2500,2500,3,3)
array([[12, 12, 12],
       [12, 12, 12],
       [12, 12, 12]], dtype=int16)
>>> array([[12, 12, 12],
... [12, 12, 12],
... [12, 12, 12]],dtype=int16)
array([[12, 12, 12],
       [12, 12, 12],
       [12, 12, 12]], dtype=int16)
>>> dataset.ReadRaster(2500,2500,3,3)
'\x0c\x00\x0c\x00\x0c\x00\x0c\x00\x0c\x00\x0c\x00\x0c\x00 ...

In this way, the image in the upper left corner of the image is located at \((3340,3780)\), and the width and height are \(10\) pixels are read out. The results returned by these two functions are different, where ReadAsArray() reads an array of NumPy of type int16 ; and ReadRaster() It is a binary type. ReadAsArray() For more explanation of the result type, see the table in the previous section.

Read data in band

Use of ReadRaster() function

Here’s an example: Using the ReadAsArray() function returns an array defined by the NumPy module.

>>> from osgeo import gdal
>>> from gdalconst import *
>>> dataset = gdal.Open("/gdata/geotiff_file.tif")
>>> band = dataset.GetRasterBand(1)
>>> band.ReadAsArray(100,100,5,5,10,10)
array([[48, 48, 54, 54, 53, 53, 51, 51, 45, 45],
       [48, 48, 54, 54, 53, 53, 51, 51, 45, 45],
       [32, 32, 40, 40, 45, 45, 46, 46, 44, 44],
       [32, 32, 40, 40, 45, 45, 46, 46, 44, 44],
       [22, 22, 34, 34, 43, 43, 58, 58, 57, 57],
       [22, 22, 34, 34, 43, 43, 58, 58, 57, 57],
       [39, 39, 52, 52, 56, 56, 56, 56, 49, 49],
       [39, 39, 52, 52, 56, 56, 56, 56, 49, 49],
       [59, 59, 61, 61, 49, 49, 42, 42, 39, 39],
       [59, 59, 61, 61, 49, 49, 42, 42, 39, 39]], dtype=uint8)

In the fifth line above, the parameters in the ReadAsArray() function, the first two 100 are the (x,y) position of the pixel in the upper left corner of the value window in the image data; The two are the size of the area covered by the value window; followed by the value of the array after the value window takes the array and scales it. It should be noted that the buffer size is automatically assigned according to the parameters, and can be specified. If not specified, it is consistent with the 3rd and 4th parameters. After 5 and 6 parameters are set, you can zoom. If the value window is \(20\times 20\) , the array can be manually set after being taken out. Let it be an array of \(10\times 10\) or an array of \(40\times 40\). If set to an array of \(20\times 40\) then the array fetched will be distorted for the real image.

It should be noted here that when shrinking, the average of several surrounding points is used for resampling. If the data of the palette type is scaled, it may cause problems.

Processing of raster data range

When reading data, it may exceed the length or height of the data itself. Take a look at the following example.

>>> band.XSize
10572
>>> band.YSize
9422
>>> band.ReadAsArray(95,100,5,5)
array([[ 52,  85,  81, 101,  77],
       [ 67,  86,  89, 105,  62],
       [ 69,  76,  86,  93,  38],
       [ 66,  68,  85,  76,  33],
       [ 60,  71,  89,  75,  47]], dtype=uint8)
ERROR 5: Access window out of range in RasterIO().  Requested
(95,100) of size 5x5 on raster of 100x100.

As you can see, something went wrong. The data can not be obtained beyond the bounds. When calling, we should first make a judgment, rather than leaving the exception to the library.