Data Management with gLite LFC
The aim of this document is to introduce file management in gLite. LCG based file catalog and related storage elemement (SE)
interactions have been kept separated by the current gLite middleware. Specialized CLI tools exist in order to handle with the LFC catalog or with the storage elements:
- "lfc-" commands interact with the LFC catalogue server that maps logical filenames to "site URLs".
- "lcg-" commands include ones used to copy files to and from an SE, and to replicate files.
The lcg-* commands performs also interaction with both the SEs and the catalogue server.
Preliminary Operations
Several environment variables need to be set before you start to ensure that the correct catalog service is used.
The default settings for these variables in your account should be correct, however this needs to be checked.
The variables which need to be checked having these exact values are:
| Variable |
Value |
Description |
| LCG_GFAL_INFOSYS |
egee-bdii.cnaf.infn.it:2170 |
reference BDII |
| LCG_CATALOG_TYPE |
lfc |
Catalogue type |
| LFC_HOST |
lfcserver.cnaf.infn.it |
LFC server fqhn |
| LFC_HOME |
/grid/yourvo/yourname |
LFC home directory |
Getting started
For each of the supported VO a separate "top level" directory exists under the "/grid" directory. You can see all the files that are stored for the gilda VO. First ensure you have a running VOMS proxy.
Listing the catalogue
To list the catalogue you can use the following
[morgan@localhost ~]$ lfc-ls -l
drwxrwxr-x 2 168 113 0 Mar 05 2007 GAUSS
drwxrwxr-x 2 168 113 0 Jan 14 2008 IFCA
drwxr-xr-x 1 root 113 0 Feb 09 2006 PIPPO
drwxrwxr-x 4 307 113 0 Jan 29 2008 ceballos
drwxrwxr-x 4 186 113 0 Jan 14 2007 curto
this command supports almost all the attributes of the Unix
ls command.
As we set the
LFC_HOME directory we are listing only its content. Use the absolute path to list the parent directories ex.
lfc-ls /grid/planck/GAUSS.
Before creating and uploading any of your own files it is common to create a personal directory for storage by using the
lfc-mkdir command:
[morgan@localhost ~]$ lfc-mkdir /grid/planck/USER
[morgan@localhost ~]$ export LFC_HOME=/grid/planck/USER
Upload file into SE
The next step is to upload a file into the directory you just created. First create locally a simple text file and then copy it to a SE,
the command used for this is
lcg-cr (LCG copy and register). Type the following to store this file on the
grid2.fe.infn.it Storage Element : (use lgc-infosites to get
the list of SE).
[morgan@localhost ~]$ echo "delme" > file.txt
[morgan@localhost ~]$ lcg-cr --vo planck -l lfn:/grid/planck/morgan/file.txt -d grid2.fe.infn.it file:/home/morgan/file.txt
guid:ecbf74c6-5257-4ffa-98f2-fbc5c20e0e6d
[morgan@localhost ~]$ lcg-ls -l lfn:/grid/planck/morgan/
-rw-rw-r-- 1 122 113 6 file.txt
Of course, the
guid you will have is different, since it is an unique identifier for each file (except for the case when you are replicating a file, as we will see). Check that the file is there by listing the contents of your directory.
Before continuing it is worth noting the difference between the command used to store the file and the creation of the directory in previous. The directory created is just a
virtual directory and only exists within the catalog of lfn's. On the other hand the file physically exists on an SE but has an additional "virtual" filename in the catalog. This is connected to why the commands just handling the lfn namespace tend to start with "lfc" whilst the commands manipulating the file directly tend to start with "lcg".
For several purpose, FTS for instance, is useful to know the file Storage URL (they can be many if the file has replicas somewhere). The appropriate command is
lcg-lr (list-replicas) [lfn | guid]
[morgan@localhost ~]$ lcg-lr --vo planck lfn:/grid/planck/morgan/file.txt
srm://grid2.fe.infn.it/planck/generated/2008-08-18/file54ba4262-f1ce-479a-a5a9-0ac7655f5eda
It is also possible to crate a symbolic link to this file:
[morgan@localhost ~]$ lfc-ln -s /grid/planck/morgan/file.txt /grid/planck/morgan/newfile.txt
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan/
-rw-rw-r-- 1 122 113 6 Aug 18 20:38 file.txt
lrwxrwxrwx 1 122 113 0 Aug 18 20:46 newfile.txt -> /grid/planck/morgan/file.txt
[morgan@localhost ~]$
This link of course has the same Storage URL of its parent:
[morgan@localhost ~]$ lcg-lr --vo planck lfn:/grid/planck/morgan/newfile.txt
srm://grid2.fe.infn.it/planck/generated/2008-08-18/file54ba4262-f1ce-479a-a5a9-0ac7655f5eda
Having already uploaded a file the next step is to show downloading a file. To download the file you already uploaded using the new lfn you have just created use the command:
[morgan@localhost ~]$ lcg-cp --vo planck lfn:/grid/planck/morgan/newfile.txt file:$HOME/test.txt
[morgan@localhost ~]$ cat test.txt
delme
Replicate file between SE
gLite supports file replication. A file can be stored on multiple SE's and then a running job can access the closest SE with the file on it, thus giving faster access times to the data. This also helps protect against failures/access difficulties with a particular SE. Using
lcg-rep command we replicate a file and than we can check the number of replicas associated to it:
[morgan@localhost ~]$ lcg-rep --vo planck lfn:/grid/planck/morgan/file.txt -d grid003.ca.infn.it
[morgan@localhost ~]$ lcg-lr --vo planck lfn:/grid/planck/morgan/file.txt
srm://grid003.ca.infn.it/dpm/ca.infn.it/home/planck/generated/2008-08-18/fileaab26271-0424-45c2-8f01-a1f5d2049c14
srm://grid2.fe.infn.it/planck/generated/2008-08-18/file54ba4262-f1ce-479a-a5a9-0ac7655f5eda
Note how the path to where each file is stored is different. This demonstrates how the use of a "lfn" avoids the need to understand the local filesystem where the replica is actually stored.
Remove files and directories
You can delete a file from SE with
lcg-del (notice the
-a option to delete all replicas!):
[morgan@localhost ~]$ lcg-del -a lfn:/grid/planck/morgan/file.txt
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan/
It is also possible to remove a directory on file catalog
[morgan@localhost ~]$ lfc-mkdir /grid/planck/morgan/test
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan/
drwxrwxr-x 0 122 113 0 Aug 18 20:57 test
[morgan@localhost ~]$ lfc-rm -r /grid/planck/morgan/
Advanced operations on LFC
Suppose you have just created an entry, for example having uploaded a file with a valid LFN, you can change, for some reason known to you, the logical file name.
[morgan@localhost ~]$ lcg-cr --vo planck -l lfn:/grid/planck/morgan/test.txt file:/home/morgan/delme.txt -d $VO_PLANCK_DEFAULT_SE
guid:28a7476b-0205-4475-be9b-f12eff0d1fca
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan/
-rw-rw-r-- 1 122 113 6 Aug 19 12:35 new.txt
The variable
VO_PLANCK_DEFAULT_SE points to the default SE of the VO PLANCK, it is very useful when working on a WN.
Sometimes can happen you have a file stored on a Storage Element, but for some reason you have not the file registered on the catalog. You can overcome through
lcg-aa, which add an alias for the file.
[morgan@localhost ~]$ lcg-cr --vo planck -d grid2.fe.infn.it file:/home/morgan/file.txt
guid:0adcb83f-ebb7-4508-8aa0-14fc57c6c583
[morgan@localhost ~]$ lcg-aa --vo planck guid:0adcb83f-ebb7-4508-8aa0-14fc57c6c583 lfn:/grid/planck/morgan/test.txt
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan
lrwxrwxrwx 1 122 113 0 Aug 19 12:20 test.txt -> /grid/planck/generated/2008-08-19/file-faf5e641-c9ea-485f-aeea-d59e7450d7e8
[morgan@localhost ~]$ lcg-la --vo planck guid:0adcb83f-ebb7-4508-8aa0-14fc57c6c583
lfn:/grid/planck/generated/2008-08-19/file-faf5e641-c9ea-485f-aeea-d59e7450d7e8
lfn:/grid/planck/morgan/test.txt
As you may note, because you didn't specify an LFN at registration time, a default has been put, and the one you added with
aa is equivalent to a symbolic link.
The correct execution could also have been verified with
lcg-la.
It may be useful, mostly for application, to know the transport URL for the file; this is done with
lcg-gt (get TURL). You must provide the SURL of the file
lcg-lr --vo planck guid:0adcb83f-ebb7-4508-8aa0-14fc57c6c583
srm://grid2.fe.infn.it/planck/generated/2008-08-19/filed7cda750-40a3-43af-870d-55e773d4524b
[morgan@localhost ~]$ lcg-gt srm://grid2.fe.infn.it/planck/generated/2008-08-19/filed7cda750-40a3-43af-870d-55e773d4524b gsiftp
gsiftp://grid2.fe.infn.it:2811/storage/planck/generated/2008-08-19/filed7cda750-40a3-43af-870d-55e773d4524b
02889f57-2afd-410a-86c8-605f1be3d88e
[morgan@localhost ~]$ lcg-gt srm://grid2.fe.infn.it/planck/generated/2008-08-19/filed7cda750-40a3-43af-870d-55e773d4524b rfio
rfio://grid2.fe.infn.it:5001/storage/planck/generated/2008-08-19/filed7cda750-40a3-43af-870d-55e773d4524b
e237f3a2-a6cb-4341-8f84-cc5ec4919ed7
Notice how the TURL, for the same file, is different on the basis of the requested transport protocol.
The LFC allows to insert a comment on existing entries. This is useful to make more meaningful LFC entries
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan
-rw-rw-r-- 1 122 113 6 Aug 19 12:35 new.txt
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan/new.txt
-rw-rw-r-- 1 122 113 6 Aug 19 12:35 /grid/planck/morgan/new.txt
[morgan@localhost ~]$ lfc-ls -l --comment /grid/planck/morgan/new.txt
-rw-rw-r-- 1 122 113 6 Aug 19 12:35 /grid/planck/morgan/new.txt Simulation done for tests
As you can see, the comment is displayed through an option of
lfc-ls. You may also delete the already inserted comment.
[morgan@localhost ~]$ lfc-delcomment /grid/planck/morgan/new.txt
[morgan@localhost ~]$
Similarly to an UNIX filesystem it is possible to change the rights associated to an entry, through
lfc-chmod
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan/new.txt
-rw-rw-r-- 1 122 113 6 Aug 19 12:35 /grid/planck/morgan/new.txt
[morgan@localhost ~]$ lfc-chmod 750 /grid/planck/morgan/new.txt
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan/new.txt
-rwxr-x--- 1 122 113 6 Aug 19 12:35 /grid/planck/morgan/new.txt
[morgan@localhost ~]$ lfc-chmod 770 /grid/planck/morgan/new.txt
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan/new.txt
-rwxrwx--- 1 122 113 6 Aug 19 12:35 /grid/planck/morgan/new.txt
LFC has a very powerful instrument given by access control list, which allows to a file/directory owner to grain finely access rights for that entry for any other users. For example, let's create a new directory and see which are the default access rights with
lfc-getacl :
[morgan@localhost ~]$ lfc-mkdir /grid/planck/morgan/accesso
[morgan@localhost ~]$ lfc-getacl !$
lfc-getacl /grid/planck/morgan/accesso
# file: /grid/planck/morgan/accesso
# owner: /C=IT/O=INFN/OU=Personal Certificate/L=INAF Trieste/CN=Giuliano Taffoni
# group: planck
user::rwx
group::rwx #effective:rwx
other::r-x
default:user::rwx
default:group::rwx
default:other::r-x
Note that ownership is expressed through DN of certificate, while group is expressed through VO membership. Then it is shown the present ACL for the entry :
- user and group have full privileges
- other can just read
Finally it is shown the default, which applies for each new entry created within this directory.
Now lets's change default ACL, with read/write permission for user and group, and no privileges for others.
The syntax we apply here is modify (
-m) default (
d:) for user (
u::), and the same of course for group and others.
The conventions for rights is the usual UNIX-like (7 all, 6 rw...)
[morgan@localhost ~]$ lcg-cr --vo planck -n 8 -l lfn:/grid/planck/morgan/accesso/due.txt file:/home/morgan/delme.txt -d grid2.fe.infn.it
guid:4a5b5c0a-98c9-4fa5-9360-f0350f3e89e8
[morgan@localhost ~]$ lfc-ls -l /grid/planck/morgan/accesso
-rw-rw---- 1 122 113 6 Aug 19 12:53 due.txt
Notice the new acl set on the dir will apply to all the files uploaded to the Grid.
Notice that
lcg-cr supports also multi-channel data transmission (as gridftp)
=-n option sets the number of channels.
Gilda LFC recursive access tools
Gilda group provide also a set of tools to operate recursively into a catalog directory. The tools can be downloaded
here
Those tools involve:
- copy from file catalogs to local filesystem whole catalog directory tree;
- cancel a whole catalog directory tree removing related files from the storage elements as well (all replicas);
- copy and registry a whole catalog directory tree;
lcg-rec-cr - Recursive copy and register
This command copies from the local filesystem to a storage element and register to the file catalog the whole source directory structure.
[morgan@localhost ~]$ lcg-rec-cr
lcg-rec-cr
----------
LCG utils based recursive copy and register to file catalog utility
Usage: lcg-rec-cr -vo -lp -cp [-fc ] [-se ] [-v]
The mandatory parameters are:
-
-vo The user needs to specify the virtual organization.
-
-lp The local path from where the user wants to copy recursively to file catalog/SE.
-
-cp The catalog path that will be used as a starting point in the file catalog.
Optional parameters are:
-
-fc It is possible to change the defalut file catalog normally defined by the content of the $LFC_HOST environment variable.
-
-se It is possible to specify a destination SE for the file storage; if it is not specified the default SE configured in the catalog will be used (VO_COMETA_DEFAULT_SE).
-
-v Use this flag to swith the command in verbose mode. Summary info will be preinted at top while all lcg/lfc-* used commands will be shown.
lcg-rec-cp - Recursive copy from file catalog/SE to local filesystem
This command copy from file catalog/storage elements into the local filesystem keeping the original file hierarchy structure registered in the file catalog.
To get help about the usage of this command just use the
(-h, --help) option.
[morgan@localhost ~]$ lcg-rec-cp
lcg-rec-cp
----------
LCG utils based recursive copy from file catalog utility
Usage: lcg-rec-cp -vo -cp -lp [-fc ] [-v]
The mandatory parameters are:
-
-vo The user needs to specify the virtual organization.
-
-lp The local path where the user wants to copy recursively to file catalog/SE.
-
-cp The catalog path that will be used as a starting point in the file catalog.
Optional parameters are:
-
-fc It is possible to change the defalut file catalog normally defined by the content of the $LFC_HOST environment variable.
-
-v Use this flag to swith the command in verbose mode. Summary info will be preinted at top while all lcg/lfc-* used commands will be shown.
lcg-rec-del - Recursive deletion from file catalog/SE
This command copy delete recursively files and directories registered into the file catalog and removes
all replicas of the file content from storage elements.
To get help about the usage of this command just use the
(-h, --help) option.
[morgan@localhost ~]$ ./lcg-rec-del -h
lcg-rec-del
----------
LCG utils based recursive delete from file catalog utility and related replicas
Usage: lcg-rec-del -vo -cp [-fc ] [-v]
The mandatory parameters are:
-
-vo The user needs to specify the virtual organization.
-
-cp The catalog path that will be used as a starting point in the file catalog.
Optional parameters are:
-
-fc It is possible to change the defalut file catalog normally defined by the content of the $LFC_HOST environment variable.
-
-v Use this flag to swith the command in verbose mode. Summary info will be preinted at top while all lcg/lfc-* used commands will be shown.
--
TaffoniGiuliano - 18 Aug 2008