The idea is simple, two subnets (separate networks) and then route packets from one to the other. The environment, however, is not symmetric. We wanted to contact a node on the other subnet and we could see the packets travelling over the switch to the router back through another switch to the node, but the node itself refused to reply.
Each node has two NICs and each NIC is connected to a separate network. If you try to connect or ping one node from another, Linux is smart enough to go directly over the NIC with the right network. If a NIC should ever fail, the failover is that the packets are then routed up one network to the router then over to the other network.
My current project that involves hundreds of mini-ITX Atom machines and we are testing the performance difference between Infiniband and Intel Gigabit NICs.
In my testing the overhead of processing TCP is too high for a dual-core Atom. There is simply not enough processing power to handle the capabilities of the Intel NICs.
A possible solution is to replace TCP by using SDP (RDMA and Zerocopy) over Infiniband. Infiniband equipment has come down significantly in price (dual port 4xSDR card for around $50), which makes it attractive to high-performance and cost-sensitive applications like mine.
In theory we can get 4xSDR speeds (8 Gigabit/s), but the tested result is 1.5 Gigabit/s speeds because of TCP processing over Infiniband. This is almost exactly the performance we achieved using the Intel NICs. We then replaced TCP with SDP over Infiniband. With the switch we saw 4.2 Gigabits/s performance on one process. With two processes, one for each core of the Atom, we saw 7.8 Gigabit/s which is close to the theoretical limit of the Infiniband NIC.