short definition

A command line interface for managing CLARiiON storage. Short for Navisphere cli.

background

host

The system I'm referring to while writing this file is a 5 board e10k domain with 3 CLARiiON 5700's attached to it.

each CLARiiON cabinet has:

naviagent runs on a host that is connected to the cabinets. navicli or the navisphere gui communications with naviagent.

Each LUN2 on a cabinet has a default SP and a current SP. If both SPs are functional, each LUN will be on its default SP. In the event that both SPs arent functional, you may (depending on whether or not you have ATF3 installed) have to manually failover a LUN to the functional SP, which will become the LUN's current SP or current owner. Any traffic to a LUN will go to will go over the connection to the current SP or current owner.

Basically you "bind" (create) LUNs which appear to be regular disks to solaris (or whatever operating system you use). People usually put these into vxvm in case they want to move around the cabinets (deport the diskgroup(s) in the cabinet).

organization of objects or whatever

raid groups contain LUNs.

Multiple LUNs can exist in a single raid group, but usually I use a separate raid group for each LUN. When unbinding a LUN, sometimes it doesnt unbind the raid group that contains it. Its good practice, when unbinding a LUN to first note the raid group that contains it, and then make sure that you have removed the raid group as well (obviously don't do that if you have multiple LUNs per raid group, heh).

starting and stopping naviagent / config files

Bouncing the agent isnt a big deal, as it is only used for management.

starting the agent

# /etc/init.d/agent start

stopping the agent

# /etc/init.d/agent stop

/etc/Navisphere/agent.config is the config file for naviagent.

# more /etc/Navisphere/agent.config
clarDescr Navisphere Agent
clarContact John Smith, 800-555-1212

device c2t0d4s2 DPE-A_SPA "DPE-A_SPA"
device c4t1d5s2 DPE-A_SPB "DPE-A_SPB"

device c3t4d2s2 DPE-B_SPA "DPE-B_SPA"
device c5t5d3s2 DPE-B_SPB "DPE-B_SPB"

device c6t3d0s2 DPE-C_SPA "DPE-C_SPA"
device c7t2d1s2 DPE-C_SPB "DPE-C_SPB"

user root@127.0.0.1
user guiweakling@10.0.0.25

array 1234567 Cabinet-1
array 7654321 Cabinet-2
array 3214435 Cabinet-3
#

The only meaningfull things in this file are the device and user lines. A CLARiiON cabinet can be managed via navicli if you know the device of any disk on that cabinet. Its easier to figure that stuff out once and then put it in your agent.config. Once that is done, you can do navicli getagent to quickly determind which disks you need to use to manage a given cabinet.

The user lines define which users can manage the storage. user root@127.0.0.1 indicates that the local root user is allowed to manage the storage. Usually people don't have navicli installed on another sun box to manage remote storage, so lines like user guiweakling@10.0.0.25 generally indicate that someone is using the NT navisphere gui at that host.

device c2t0d4s2 DPE-A_SPA "DPE-A_SPA"

DPE-A_SPA and "DPE-A_SPA" are arbitrary strings used by the user to identify which cabinet corresponds with which device.

array 1234567 Cabinet-1

Cabinet-1 is also an arbitrary string, and the number before it is the serial number of the cabinet. I'm pretty sure the array lines only get used by the Navisphere gui (and who wants to use a gui).

navicli

navicli is actually a bunch of commands all rolled into one, if you run it with no arguments, it will barf out a list of them:

# navicli
navicli [-p] [-v|q] [-m] [-np] [-t timeout] [-h hostname]
  [-d device] [-help] CMD <optional-args>

Possible commands are:  accesscontrol  arrayname   bind     chglun
chgrg       clearlog    clearstats  createrg    failback    fairness
firmware    getagent    getatf      getcache    getcontrol  getcrus
getdisk     getlog      getloop     getlun      getrg       getsniffer
port
r3wrbuff    rebootSP    removerg    register    setcache    setloop
setraid5    setsniffer  setspstime  setstats    storagegroup systemtype
trespass    unbind      SC_OFF

#

The commands focused on here are bind, getagent, getdisk, getlog, getlun, removerg, trespass, and unbind.

Usually commands without any parameters will give you all the information available. By adding extra parameters the commands will return only the information you are looking for. for example:

# navicli -d c2t0d4s2 getlun 0

Will return a Lot of information, but say you only want to know the size of the LUN

# navicli -d c2t0d4s2 getlun 0 -capacity
Lun Capacity:               134903
#

getagent

Running "navicli getagent" with no arguments returns a Lot of info..usually you know which controllers4 your stuff is connected to (c2 and c4 are cabinet 1, c3 and c5 are cabinet 2, c6 and c7 are cabinet 3), so instead of looking at format or something, you can do

# navicli getagent -node

Node:           c2t0d4s2

Node:           c4t1d5s2

Node:           c3t4d2s2

Node:           c5t5d3s2

Node:           c6t3d0s2

Node:           c6t3d0s2

Node:           c7t2d1s2

#

every other command takes a -d parameter to specify a device, ex:

# navicli -d c2t0d4s2 getlun 0

It doesnt matter which SP you use, if i did that same command with a -d of c4t1d5s2 it would return the same result because disks on controllers 2 and 4 are cabinet 1 in this setup.

getlun

getlun tells you a lot of information about a LUN. Most importantly it tells you: which physical disks are involved in the LUN, the LUN capacity, the default SP, the current SP, and the raid type.

syntax: navicli -d [device] getlun [LUN] <options>

# navicli -d c2t0d4s2 getlun 0
...
RAID Type:                  RAID5
RAIDGroup ID:               0
State:                      Bound
...
Current owner:              SP B
...
Default Owner:              SP B
...
Prct Rebuilt:               100
Prct Bound:                 100
Lun Capacity:               134903
...

Enclosure  0 Disk 0  Enabled
...
Enclosure  0 Disk 1  Enabled
...
Enclosure  0 Disk 2  Enabled
...
Enclosure  0 Disk 3  Enabled
...
Enclosure  0 Disk 4  Enabled
...

#

disks 0-4 in enclosure 0 are involved in a 134903 meg raid5 set.

bind

binds (creates) a LUN. Binding a LUN is a destructive action, all data is zero'd out on the disks (the reason it takes so long to bind a LUN).

syntax: navicli -d [device] bind [raid type] [LUN] [   ...] <options>

CLARiiONs support the following raid types (from navicli man page):

The LUN you specify has to not exist (duh). You want to alternate binding LUNs with default SPs of sp-a and sp-b to split the i/o load. Lets bind a 10 disk raid 5 (r5) set on sp-b (lets pretend that LUN 9 doesnt exist and that none of the disks on enclosure 8 are being used)

# navicli -d c2t0d4s2 bind r5 9 8_0 8_1 8_2 8_3 8_4 8_5 8_6 8_7 8_8 8_9 -sp b
#

The format (as you have noticed) is enclosure_disk5. the command returns fast..it merely told the SP to bind that LUN. To check the binding status:

# navicli -d c2t0d4s2 getlun 9 -state
State:                      Binding
#

ok so its binding...how much time until it finishes?

# navicli -d c2t0d4s2 getlun 9 -bind
Prct Bound:                 10
#

When finished, the state of the lun will be bound (percent bound will be 100).

unbind

unbinds (destroys) a LUN.

syntax: navicli -d [device] unbind [LUN] <-o> <options>

the -o option will prevent navicli from prompting you "do you really want to unbind? blah blah". Sometimes it doesnt remove the raid group (perhaps i was imagining that..but lets be safe heh).

# navicli -d c2t0d4s2 getlun 9 -rg
RAIDGroup ID:               9
#

Note: the raid group id will sometimes differ from the LUN. don't assume anything

# navicli -d c2t0d4s2 unbind 9 -o
# navicli -d c2t0d4s2 removerg 9 -o
#

getdisk

Gets information about a disk (gee, thats a surprise): capacity, and a bunch of other stuff.

syntax: navicli -d [device] getdisk [disk] <options>

# navcli -d c2t0d4s2 getdisk 0_0
Enclosure  0 Disk 0
Vendor Id:            SEAGATE
Product Id:           ST136403 CLAR36
Product Revision:     3844
Lun:                  0
Type:                 0: RAID5
State:                Enabled
Hot Spare:            0: NO
Prct Rebuilt:         0: 100
Prct Bound:           0: 100
Serial Number:        LT321123
Sectors:              69070464 (33725)
Capacity:             35458
Private:              0: 184320
Bind Signature:       0x13ac, 0, 0
Hard Read Errors:     0
Hard Write Errors:    0
Soft Read Errors:     0
Soft Write Errors:    0
Read Retries:         2714
Write Retries:        51
Remapped Sectors:     0
Number of Reads:      5270941
Number of Writes:     2810246
Number of Luns:       1
Raid Group ID:        0
#

This disk is a 36 gig (35458 meg from Capacity line), its involved a raid 5 set on LUN 0, raid group 0, etc, etc.

trespass

trespass will failover all the LUNs on a failed SP to the functional SP.

syntax: navicli -d [device] trespass all

You don't have to use trespass all, you could specify specific LUNs. heres an example: say sp-a fails on cabinet 2 and we can no longer see the LUNs on it in solaris. Run a :

# navicli -d c4t1d5s2 trespass all
#

There we are. heh..another important thing, notice how I used c4, because sp-a failed and we could no longer see those disks in solaris? If I had used -d c2t0d4s2, it wouldn't have worked. Along the same lines, if you happen to unbind a LUN that you have specified as a device in /etc/Navisphere/agent.config, you need to look at format and find a disk thats on that controller and change the entry for that SP to that disk. It doesnt matter which disk, just as long as its on the same controller.

getlog

getlog gets the SP log..usefull for troubleshooting (and sending to EMC when you get really scary errors and/or failures).

syntax: navicli -d [device] getlog

# navicli -d c2t0d4s2 getlog
lots of output, heh
#

references

http://www.cuddletech.com/veritas/raidtheory.html
http://www.emc.com/products/systems/clariion.jsp?openfolder=storage_systems

  1. Storage Processor. The component that handles all of the raid.
  2. Logical Unit Number. I swear there used to be a LUN node. The SP has the interfaces on it.
  3. Application Transparent Failover. A piece of software that sits between the fibre channel drivers and the operating system that allows for automatic failover of a LUN from one controller to another without interrupting any applications accessing filesystems that exist on said LUN.
  4. Meaning disk controller in solaris. Related to which HBA(s) the SPs on the cabinet are attached to.
  5. enclosure 0 is the bottom enclosure. disk 0 is at the left of the enclosure.