Warrick Mitchell , AARNET ( [email protected])
Breanna Meade, Indiana University ( [email protected])
Introduction and Background
Network owners and users need to ensure data is moving over the research and education (R&E) circuits correctly so that data transfers are performing well. Experience has shown that adding or removing capacity can have unexpected routing results that are difficult to detect automatically and impossible to correct without coordination. The anomalous flows can be identified fairly easily with current tools, but it is still challenging to work across the R&E networking community to adjust the erroneous paths, which can be complicated due to the overall number of organizations involved. However, this type of routing problem is prevalent and growing as more capacity along different routes is added, and we expect this issue is one many NRENs will need to address in order to maintain a robust, reliable, high-speed global R&E network.
Some of the problems we currently see with R&E data transfers and routes include:
- Data taking a longer route than necessary, for example, unnecessarily crossing oceans.
- Traffic taking an unexpected route, for example, hitting two routers in a single exchange point, before and after passing through a third (likely unnecessary) exchange point.
- Traffic being routed over commercial capacity instead of remaining on R&E capacity.
This WG will provide a natural context for this work, given the inter-regional extent of these issues that affect facilities and sites located in several regions of the world, and the essential need for common cooperation to resolve them.
This working group will engage network owners and NRENs to not only reactively discuss and address ineffective routes, but will work proactively across the community to systematically create policies to prevent them from occurring. This working group will be jointly chartered by APAN.
Goals of the Group
While the overarching goal of the Routing Working Group is to Identify and resolve ineffective routes across the global R&E networks, and ensure that R&E traffic uses R&E capacity as often as possible, this will take place via two parallel but complementary tracks (with personnel shared between them):
- From an engineering point of view, this will include promoting and supporting advanced tools to identify and resolve routing issues, such as TraceRoute, RouteViews, NetSage, and Looking Glass (Router Proxy). It will also involve substantial collaboration between network owners and operators to resolve existing routing problems.
- From a policy point of view, this will include work with higher-level networking operators to detail their routing policies in such a way that there can be a verification of whether or not the traffic is instantiating that policy.
The following deliverables are planned for the Routing Working Group:
- A web page, slack space, mailing list, and other collaborative tools to enable real time collaboration and investigation.
- Tool Talks for the broader community during the first year to assist R&E network operators and owners to become more familiar with available tools to assist in this space.
- Meetings to take place (at least monthly?) for the engineers to discuss the resolution of known routing issues.
- An ongoing inventory of detected and resolved routing issues, along with the list of participants who are assigned each case. Unfortunately, these issues continue to emerge with no sign of slowing or stopping, thereby necessitating the long term existence of such a group.
- A process workflow to resolve detected routing issues, to be augmented over time via practice.
- A list of routing policies for NRENs and network owners to be used to verify possible routing problems. (This may also need corresponding meetings?)
- Consider defining a process to review and adjust routing when additional R&E network capacity is added, removed, or changed.
- Consider developing tools or dashboards to more easily identify routing issues as they arise over time.