Use python for carbon calculation from raster files (VRT)
Intro
In this PR, we switch the underlying technology for carbon stats calculation, querying the large TIF data from PostGIS to Python. This allows us to avoid the step where we have to load raster into the DB, which is time and resource-consuming and adds more operational complications (backups, uploading, restoring, tuning…). We just read directly from TIF files on disk. We leverage Python packages like rasterio
and numpy
. We also leverage parallel computation. Testing on a local computer, returning the stats for any polygon selection takes at most 10 seconds. The time it takes will always be predictable/stable as we always sample up to 2500 random points (can be changed) from the polygon. Our backend is in Clojure; we leverage libpython-clj
to establish a bridge between Python and JVM (Clojure running on top) and have the result as a map. This system aims to be compatible with the previous one (in terms of data shape returned) to be a drop-in replacement. The main requirement for this to work is to generate a VRT mosaic raster. The default location carbon_tiles/mosaic.vrt
can be updated in the RASTER_PATH
constant. One future improvement can be making this configurable from config.edn
. A script carbon_tiles/create_vrt.sh
was provided to generate this VRT (quick step). A script carbon_tiles/optimize_tifs.sh
for an optional step, to optimize the raster tiles (enabling tiling and craning in 512x512 block size), was also provided.
Requirement:
- Java 17 requirement (for
libpython-clj
to work properly) - GDAL should be installed
-
pip install numpy rasterio shapely geojson pyproj
(packages we depend on)
To Test:
- Put some TIFs inside
carbon_tiles
files - Optionally optimize them (output to optimized)
- Run
create_vrt
to havemosaic.vrt
; the Python script will read it to do calculations
Closes FCF-72