I have a dataset of pupils and the schools they attend. For each pupil I have a source XY coordinate and a target XY coordinate. I have a table of schools where the school name matches the name in the pupil table. I also have a full featured road network complete with road routing information and turn restrictions. All this sits in a PostgreSQL database with PostGIS and pgRouting installed. I need to use all this to calculate the shortest and quickest routes to school for each pupil. The results will then be used to inform some decision-making process about school transport and school location.
Two different solutions will be applied - routing from nearest node to nearest node and routing from nearest edge position to nearest edge position.
First, the pupil location needs to be associated with the nearest node on the road network.
-- Find the nearest network node for each pupil CREATE TEMP TABLE temp_pupil_nn AS SELECT DISTINCT ON (a.pupilid) a.pupilid, b.id AS nearest_node FROM edn_pupilschoolroute a LEFT JOIN hw_roadlink_vertices_pgr b ON ST_DWithin(a.geometry, b.the_geom, 1200) ORDER BY a.pupilid, ST_Distance(a.geometry, b.the_geom);
This query creates a temporary table with a row for each pupil and the nearest node ID within a specified distance. You can find all the pupils that don't have a node within the specified distance. Adjust the distance in the ST_DWithin function until every pupil has a node assigned.
-- Find pupils without a node SELECT a.pupilid FROM edn_pupilschoolroute a LEFT JOIN hw_roadlink_vertices_pgr b ON ST_DWithin(a.geometry, b.the_geom, 1200) WHERE b.id IS NULL;
Once the temporary table is populated update the pupil table with the node id. You could probably do it in a wunner but I like separate steps.
-- Update pupil data table with NN information UPDATE edn_pupilschoolroute SET pupil_nn = (SELECT nearest_node FROM temp_pupil_nn b WHERE b.pupilid = edn_pupilschoolroute.pupilid);
Repeat the steps above for each school.
Using this approach works well where there is a dense road network and the road node is a suitable proxy for pupil location but most of my network is in rural areas where nodes are fewer and further apart. This can lead to differences between the actual and modelled distances. To get around this we can route from the point on the nearest edge closest to the pupil to the point on the nearest edge closest to the school.
So, find the closest edge and position of the point on the edge for each pupil by creating a view.
-- Find the nearest network edge to each pupil CREATE OR REPLACE VIEW view_source_edges AS SELECT DISTINCT ON (a.pupilid) a.pupilid, b.ogc_fid AS nearest_edge, ST_Distance(a.geometry, b.centrelinegeometry) AS distance FROM edn_pupilschoolroute a, hw_roadlink b WHERE ST_DWithin(a.geometry, b.centrelinegeometry, 1200) ORDER BY a.pupilid, ST_Distance(a.geometry, b.centrelinegeometry);
Do the same for schools with a view of the target edges.
-- Find the nearest network edge to each school CREATE OR REPLACE VIEW view_target_edges AS SELECT DISTINCT ON (a.schoolname) a.schoolname, b.ogc_fid AS nearest_edge, ST_Distance(a.geometry, b.centrelinegeometry) AS distance FROM view_edn_dataschool a, hw_roadlink b WHERE ST_DWithin(a.geometry, b.centrelinegeometry, 1200) ORDER BY a.schoolname, ST_Distance(a.geometry, b.centrelinegeometry);
Make sure the pupil table has the required fields for the edge calculations
ALTER TABLE edn_pupilschoolroute ADD COLUMN source_eid integer, --source edge id ADD COLUMN source_pos double precision, --source edge position ADD COLUMN target_eid integer, --target edge id ADD COLUMN target_pos double precision, --target edge position ADD COLUMN p_ndist double precision, --node to node distance ADD COLUMN p_ntime double precision, --node to node time ADD COLUMN p_edist double precision, --edge to edge distance ADD COLUMN p_etime double precision; --edge to edge time
The last four columns added are those that are populated by the function. Update the first four columns with the edge ids and positions.
UPDATE edn_pupilschoolroute a SET source_eid = b.source_eid, source_pos = b.source_pos, target_eid = c.target_eid, target_pos = c.target_pos FROM view_source_edges b, view_target_edges c WHERE a.pupilid = b.pupilid AND a.schoolname = c.schoolname;
The function to process all this information into routes goes like this:
-- Function: corporate.pupil_journeys_routes() -- DROP FUNCTION corporate.pupil_journeys_routes(); CREATE OR REPLACE FUNCTION corporate.pupil_journeys_routes() RETURNS character varying AS $BODY$ DECLARE cur_pupil refcursor; v_pid integer; v_seid integer; v_spos float8; v_teid integer; v_tpos float8; v_geom geometry; v_sql varchar(1000); BEGIN RAISE NOTICE 'Processing pupil journeys...'; OPEN cur_pupil FOR EXECUTE format('SELECT pupilid, source_eid, source_pos, target_eid, target_pos FROM edn_pupilschoolroute WHERE pupil_nn IS NOT NULL'); LOOP FETCH cur_pupil INTO v_pid, v_seid, v_spos, v_teid, v_tpos; EXIT WHEN NOT FOUND; SELECT ST_Collect(r.centrelinegeometry) AS geometry FROM pgr_trsp(' SELECT ogc_fid AS id, source::integer, target::integer, cost_len::double precision AS cost, rcost_len::double precision AS reverse_cost FROM hw_roadlink'::text, v_seid, --pupil source edge id v_spos, --source edge position v_teid, --school target edge id v_tpos, --target edge position true, true, 'select to_cost, teid as target_id, feid||coalesce('',''||via,'''') as via_path from hw_nt_restrictions'::text) AS d INNER JOIN hw_roadlink r ON d.id2 = r.ogc_fid INTO v_geom; -- insert route cost into pupil data table EXECUTE format('INSERT INTO %s(pupilid,geometry) VALUES ($1,$2)','corporate.edn_pupilroutes') USING v_pid, v_geom; END LOOP; RETURN 'Oooh, get in!'; CLOSE cur_pupil; END; $BODY$ LANGUAGE plpgsql VOLATILE COST 100;
For 16,000 pupils and 57 schools this took about 2 hours to return a result.