[lustre-devel] Is network interface fail-over on same server on the road map?

Brian O'Connor briano at sgi.com
Wed Aug 8 19:41:57 PDT 2012


Hi

    at present AFAIK a lustre connection will not failover between NIDS 
on the same server. For example if you have a 2 MDS servers with two IB 
interfaces and an ETH interface configured as a HA pair, you can 
configure the client to *mount* on the basis of which they can see, but 
once the connection is made if the network switch, cable or HCA fails 
then the client cannot failover to the other NID on the *same* server, 
it fails over to a NID on the HA pair (and the resources probably wont 
be on the HA pair)

So my question is... is this  feature on a road map, or is it just not 
compatible with other aspects of Lustre.

So,  to try and be clear, assuming the following;

Servers
MDS1:  ib0=192.168.1.1/24, ib1=192.168.2.1/24, eth0=10.0.0.1/24
MDS2:  ib0=192.168.1.2/24, ib1=192.168.2.2/24, eth0=10.0.0.2/24
OSS1:  ib0=192.168.1.11/24, ib1=192.168.2.11/24, eth0=10.0.0.11/24
..
OSS20: ib0=192.168.1.30/24, ib1=192.168.2.30/24, eth0=10.0.0.30/24

Clients

c1:    ib0=192.168.1.101/24, ib1=192.168.2.101/24, eth0=10.0.0.101/24
..
c100:  ib0=192.168.1.200/24, ib1=192.168.2.200/24, eth0=10.0.0.200/24

I mount on the client with

mount -t lustre 
192.168.1.1 at o2ib,192.168.2.1 at o2ib,10.0.0.1 at tcp:192.168.1.2 at o2ib,192.168.2.2 at o2ib,10.0.0.2 at tcp:/lustre 
/lustre

(and setup to load balance odd/even clients to odd/even ips on servers)

client "c1" mounts initially via 192.168.1.1 at o2ib. If later the client 
fails to communicate on  192.168.1.1 at o2ib at present it will *not* try 
192.168.2.1 or 10.0.0.1, it will try  192.168.1.2 or 192.168.2.2  or 
10.0.0.2  on the configured HA pair. This complicates the HA setup in 
that you have to monitor the networks, and stonith all the mds/oss so 
that the resources are available on the HA pair when needed, and so far 
this has never worked out for me.

Is there an architectural reason that you can't failover over to another 
NID on the same server?



-- 
Brian O'Connor
-------------------------------------------------------------
SGI Consulting
Email: briano at sgi.com, Mobile +61 417 746 452
Phone: +61 3 9963 1900, Fax:  +61 3 9963 1902
691 Burke Road, Camberwell, Victoria, 3124
AUSTRALIA
http://www.sgi.com/support/services
-------------------------------------------------------------


More information about the lustre-devel mailing list