Processing thousands of journeys: a pgRouting function
I have a dataset of pupils and the schools they attend. For each pupil I have a source XY coordinate and a target XY coordinate. I have a table of schools where the school name matches the name in the pupil table. I also have a full featured road network complete with road routing information and turn restrictions. All this sits in a PostgreSQL database with PostGIS and pgRouting installed. I need to use all this to calculate the shortest and quickest routes to school for each pupil. The results will then be used to inform some decision-making process about school transport and school location.
Two different solutions will be applied - routing from nearest node to nearest node and routing from nearest edge position to nearest edge position.
First, the pupil location needs to be associated with the nearest node on the road network.
-- Find the nearest network node for each pupil
CREATE TEMP TABLE temp_pupil_nn AS
SELECT DISTINCT ON (a.pupilid) a.pupilid, b.id AS nearest_node
FROM edn_pupilschoolroute a
LEFT JOIN hw_roadlink_vertices_pgr b ON ST_DWithin(a.geometry, b.the_geom, 1200)
ORDER BY a.pupilid, ST_Distance(a.geometry, b.the_geom);
This query creates a temporary table with a row for each pupil and the nearest node ID within a specified distance. You can find all the pupils that don't have a node within the specified distance. Adjust the distance in the ST_DWithin function until every pupil has a node assigned.
-- Find pupils without a node
SELECT a.pupilid
FROM edn_pupilschoolroute a
LEFT JOIN hw_roadlink_vertices_pgr b ON ST_DWithin(a.geometry, b.the_geom, 1200)
WHERE b.id IS NULL;
Once the temporary table is populated update the pupil table with the node id. You could probably do it in a wunner but I like separate steps.
-- Update pupil data table with NN information
UPDATE edn_pupilschoolroute
SET pupil_nn =
(SELECT nearest_node
FROM temp_pupil_nn b
WHERE b.pupilid = edn_pupilschoolroute.pupilid);
Repeat the steps above for each school.
Using this approach works well where there is a dense road network and the road node is a suitable proxy for pupil location but most of my network is in rural areas where nodes are fewer and further apart. This can lead to differences between the actual and modelled distances. To get around this we can route from the point on the nearest edge closest to the pupil to the point on the nearest edge closest to the school.
So, find the closest edge and position of the point on the edge for each pupil by creating a view.
-- Find the nearest network edge to each pupil
CREATE OR REPLACE VIEW view_source_edges AS
SELECT DISTINCT ON (a.pupilid) a.pupilid, b.ogc_fid AS nearest_edge, ST_Distance(a.geometry, b.centrelinegeometry) AS distance
FROM edn_pupilschoolroute a, hw_roadlink b
WHERE ST_DWithin(a.geometry, b.centrelinegeometry, 1200)
ORDER BY a.pupilid, ST_Distance(a.geometry, b.centrelinegeometry);
Do the same for schools with a view of the target edges.
-- Find the nearest network edge to each school
CREATE OR REPLACE VIEW view_target_edges AS
SELECT DISTINCT ON (a.schoolname) a.schoolname, b.ogc_fid AS nearest_edge, ST_Distance(a.geometry, b.centrelinegeometry) AS distance
FROM view_edn_dataschool a, hw_roadlink b
WHERE ST_DWithin(a.geometry, b.centrelinegeometry, 1200)
ORDER BY a.schoolname, ST_Distance(a.geometry, b.centrelinegeometry);
Make sure the pupil table has the required fields for the edge calculations
ALTER TABLE edn_pupilschoolroute
ADD COLUMN source_eid integer, --source edge id
ADD COLUMN source_pos double precision, --source edge position
ADD COLUMN target_eid integer, --target edge id
ADD COLUMN target_pos double precision, --target edge position
ADD COLUMN p_ndist double precision, --node to node distance
ADD COLUMN p_ntime double precision, --node to node time
ADD COLUMN p_edist double precision, --edge to edge distance
ADD COLUMN p_etime double precision; --edge to edge time
The last four columns added are those that are populated by the function. Update the first four columns with the edge ids and positions.
UPDATE edn_pupilschoolroute a
SET source_eid = b.source_eid,
source_pos = b.source_pos,
target_eid = c.target_eid,
target_pos = c.target_pos
FROM view_source_edges b, view_target_edges c
WHERE a.pupilid = b.pupilid
AND a.schoolname = c.schoolname;
The function to process all this information into routes goes like this:
-- Function: corporate.pupil_journeys_routes()
-- DROP FUNCTION corporate.pupil_journeys_routes();
CREATE OR REPLACE FUNCTION corporate.pupil_journeys_routes()
RETURNS character varying AS
$BODY$
DECLARE
cur_pupil refcursor;
v_pid integer;
v_seid integer;
v_spos float8;
v_teid integer;
v_tpos float8;
v_geom geometry;
v_sql varchar(1000);
BEGIN
RAISE NOTICE 'Processing pupil journeys...';
OPEN cur_pupil FOR EXECUTE format('SELECT pupilid, source_eid, source_pos, target_eid, target_pos FROM edn_pupilschoolroute WHERE pupil_nn IS NOT NULL');
LOOP
FETCH cur_pupil INTO v_pid, v_seid, v_spos, v_teid, v_tpos;
EXIT WHEN NOT FOUND;
SELECT ST_Collect(r.centrelinegeometry) AS geometry FROM pgr_trsp('
SELECT ogc_fid AS id,
source::integer,
target::integer,
cost_len::double precision AS cost,
rcost_len::double precision AS reverse_cost
FROM hw_roadlink'::text,
v_seid, --pupil source edge id
v_spos, --source edge position
v_teid, --school target edge id
v_tpos, --target edge position
true,
true,
'select to_cost, teid as target_id, feid||coalesce('',''||via,'''') as via_path from hw_nt_restrictions'::text) AS d
INNER JOIN hw_roadlink r ON d.id2 = r.ogc_fid INTO v_geom;
-- insert route cost into pupil data table
EXECUTE format('INSERT INTO %s(pupilid,geometry) VALUES ($1,$2)','corporate.edn_pupilroutes')
USING v_pid, v_geom;
END LOOP;
RETURN 'Oooh, get in!';
CLOSE cur_pupil;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
For 16,000 pupils and 57 schools this took about 2 hours to return a result.