I recently got a dataset made up of 2364 images files representing 1km x 1km tiles of 5m resolution Digital Terrain Model (DTM).  There are also another 2364 files of 1km x 1km 2m resolution Digital Surface Model (DSM).  I want to serve these with Geoserver which prefers fewer, larger files over more smaller files.

lots and lots of little files

GDAL is my tool of choice for batch processing images and in this case I want to use gdalwarp or gdal_merge to create my larger tiles.  The files are named logically based on the National Grid - NO4321.tif - and I can use this to create 10km x 10km tiles from the 1km x 1km tiles.  If I take the first (4) and third (2) numbers after the grid square reference (NO) I get the 10km tile reference - NO42.

10km x 10km tile made up from 1km x 1km tiles

So, how do I automate this process to create my tiles?

First up I need some text files, one for each 10km tile, with the list of image tiles making up the larger tile.  The following batch command has a nested loop that iterates through two sets of numbers (0 to 9) and substitutes those in the DIR command to find all image filenames that match the string (the ? represents a single character in the string):

FOR /L %%G IN (0,1,9) DO (  FOR /L %%H IN (0,1,9) DO (DIR /b /s NO_tiflzw\NO%%G?%%H?.tif > NO%%G%%H.txt  ))

I end up with a directory of text files like NO00.txt, NO01.txt, NO02.txt, etc.  Each text file has a list of the 1km tiles that make up the 10km tile.

gdal_merge and gdalwarp can take a text file list as an input so now we can loop through the list of text files and generate a 10km tile for each one:

FOR /F %%I IN (*.txt) DO gdal_merge -o NO_tifmerged%~nI.tif -of   GTiff -co TFW=YES -co TILED=YES -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 -co COMPRESS=LZW -ot Float32 -v --optfile %%I

This creates an internally tiled, LZW compressed, floating point GeoTIFF with accompanying world file.  If you want more control over the output raster try gdalwarp instead:

FOR /F %%I IN (*.txt) DO gdalwarp -s_srs EPSG:27700 -t_srs EPSG:27700 -ot Float32 -r average --config GDAL_CACHEMAX 500 -wm 500 -of GTiff -co TFW=YES -co TILED=YES -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 -co COMPRESS=LZW --optfile %%I NO_tifwarped%~nI.tif
single, merged 10km tile`

After this I can generate some stats for each image:

FOR /F %%G IN ('dir /b *.tif') DO gdalinfo %%G -stats

And some overviews with gdaladdo for performance:

FOR /F %%H IN ('dir /b *.tif') DO gdaladdo -r average %%H 2 4 8 16

*Remember, if you run this from the command line replace the "%%" with "%"

Now I have 35 tiles to serve instead of 2364 which should make Geoserver faster.

If you're running on Linux here are some simple Bash shell scripts which do the same as the Command shell loops above:

for i in {0..9}
 do
   for j in {0..9}
   do
     ls -Q NO/NO$i?$j?.asc > NO$i$j.txt
   done
 done
exit 0
for f in .txt;
 do
   gdalwarp -s_srs EPSG:27700 -t_srs EPSG:27700 -ot Float32 -r average
--config GDAL_CACHEMAX 500 -wm 500 -of GTiff -co TFW=YES -co TILED=YES
-co BLOCKXSIZE=512 -co BLOCKYSIZE=512 -co COMPRESS=LZW
--optfile $f NO_10k/"${f%.}".tif
 done
exit 0