Linux Admin Reference – GFS2 Filesystem configuration in Clustered Environment – RHCS Part1
Other Learning Articles that you may like to read
Free Courses We Offer
Paid Training Courses we Offer
What is the Purpose of LVM in Clustered Environment and What are the Limitations?
The Red Hat GFS2 file system is a native file system that interfaces directly with the Linux kernel file system interface (VFS layer). Red Hat supports the use of GFS2 file systems only as implemented in Red Hat Cluster Suite. GFS2 is based on a 64-bit architecture, which can theoretically accommodate an 8 EB file system. However, the current supported maximum size of a GFS2 file system is 25 TB.
Important Note from Redhat :
Although a GFS2 file system can be implemented in a standalone system or as part of a cluster configuration, for the Red Hat Enterprise Linux 6 release Red Hat does not support the use of GFS2 as a single-node file system. Red Hat does support a number of high-performance single node file systems which are optimized for single node and thus have generally lower overhead than a cluster file system. Red Hat recommends using these file systems in preference to GFS2 in cases where only a single node needs to mount the file system. Red Hat will continue to support single-node GFS2 file systems for mounting snapshots of cluster file systems (for example, for backup purposes).
Limitations of Traditional Filesystems ( ext2/ ext3 /ext4) in Clustered Environment
Step 1: On gurkulclu-node1, creating a ext4 filesystem on top of shared storage manged by CLVM
[root@gurkulclu-node1 ~]# mkfs.ext4 /dev/vgCLUSTER/clvolume1
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=0 blocks, Stripe width=0 blocks
25688 inodes, 102400 blocks
5120 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
13 block groups
8192 blocks per group, 8192 fragments per group
1976 inodes per group
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 37 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
Step 2: mounting new filesystems on /clvmnt mount point
[root@gurkulclu-node1 ~]# mkdir /clvmnt
[root@gurkulclu-node1 ~]# mount /dev/vgCLUSTER/clvolume1 /clvmnt
Step 3: On gurkulclu-node2, directly mounting the newly created filesystem to the mount point /clvmnt
[root@gurkulclu-node2 ~]# mkdir /clvmnt
[root@gurkulclu-node2 ~]# mount /dev/vgCLUSTER/clvolume1 /clvmnt
Step 4: on gurkulclu-node1, creating a test directory under the mount point
[root@gurkulclu-node1 ~]# cd /clvmnt
[root@gurkulclu-node1 clvmnt]# mkdir test-under-ext4-fs
[root@gurkulclu-node1 clvmnt]# ls -l
total 14
drwx——. 2 root root 12288 Sep 21 12:53 lost+found
drwxr-xr-x. 2 root root 1024 Sep 21 12:54 test-under-ext4-fs
Step 5: on gurkulclu-node2, verify, if the newly created directory visible. But it doesn’t appear
[root@gurkulclu-node2 ~]# cd /clvmnt
[root@gurkulclu-node2 clvmnt]# ls
lost+found
Step 6: on gurlulclu-node2, just unmount and remount volume to reflect the actual content of the filesystem
[root@gurkulclu-node2 ~]# umount /clvmnt
[root@gurkulclu-node2 ~]# mount /dev/vgCLUSTER/clvolume1 /clvmnt
[root@gurkulclu-node2 ~]# ls -l /clvmnt
total 14
drwx——. 2 root root 12288 Sep 21 12:53 lost+found
drwxr-xr-x. 2 root root 1024 Sep 21 12:54 test-under-ext4-fs
[root@gurkulclu-node2 ~]#
Note : There are no supported tool/command line option to convert ext3/ext4 filesystem to GFS/GFS2 or existing shared GFS/GFS2 filesystem to ext3/ext4 filesystem. The only way is to, take backup of data resides at ext3 filesystem , create new GFS/GFS2 volume per the requirement and restore the data back to GFS/GFS2 filesystem.
Inital GFS2 Filesystem Configuration
Identifying Required information
The Syntax to Create a GFS Filesystem is as Below
mkfs.gfs2 -p lock_dlm -t ClusterName:FSName -j NumberJournals BlockDevice
[root@gurkulclu-node1 ~]# mkfs.gfs2 -h
Usage:
mkfs.gfs2 [options] <device> [ block-count ]
Options:
-b <bytes> Filesystem block size
-c <MB> Size of quota change file
-D Enable debugging code
-h Print this help, then exit
-J <MB> Size of journals
-j <num> Number of journals
-K Don’t try to discard unused blocks
-O Don’t ask for confirmation
-p <name> Name of the locking protocol
-q Don’t print anything
-r <MB> Resource Group Size
-t <name> Name of the lock table
-u <MB> Size of unlinked file
-V Print program version information, then exit
Before proceeding to creating GFS2 filesystem we need to Gather Below information that is required to create a new GFS file systems.
Identify Required Block Size: Default (4K) Blocks Are Preferred
As of the Red Hat Enterprise Linux 6 release, the mkfs.gfs2 command attempts to estimate an optimal block size based on device topology. In general, 4K blocks are the preferred block size because 4K is the default page size (memory) for Linux.
Identify Required No. of Journals required to Create the Filesystem.
GFS2 requires one journal for each node in the cluster that needs to mount the file system. For example, if you have a 16-node cluster but need to mount only the file system from two nodes, you need only two journals. If you need to mount from a third node, you can always add a journal with the gfs2_jadd command. With GFS2, you can add journals on the fly.
Identify Required Journal Size: Default (128MB) Is Usually Optimal
When you run the mkfs.gfs2 command to create a GFS2 file system, you may specify the size of the journals. If you do not specify a size, it will default to 128MB, which should be optimal for most applications.
Identify Required Size and Number of Resource Groups
When a GFS2 file system is created with the mkfs.gfs2 command, it divides the storage into uniform slices known as resource groups. It attempts to estimate an optimal resource group size (ranging from 32MB to 2GB). You can override the default with the -r option of the mkfs.gfs2 command.
Identify the Clustername
Below is my current configuration, and we can see my cluster is named as “Gurkulcluster”
[root@gurkulclu-node1 ~]# clustat
Cluster Status for Gurkulcluster @ Sat Sep 21 13:15:04 2013
Member Status: Quorate
Member Name ID Status
—— —- —- ——
gurkulclu-node1 1 Online, Local, rgmanager
gurkulclu-node2 2 Online, rgmanager
Service Name Owner (Last) State
——- —- —– —— —–
service:HAwebService (gurkulclu-node2) stopped
[root@gurkulclu-node1 ~]#
Identify Required Locking Protocol
We will use “lock_dlm” protocol in this case
Understand GFS2 Node Locking Mechanisim:
The difference between a single node file system and GFS2, then, is that a single node file system has a single cache and GFS2 has a separate cache on each node. In both cases, latency to access cached data is of a similar order of magnitude, but the latency to access uncached data is much greater in GFS2 if another node has previously cached that same data.
In order to get the best performance from a GFS2 file system, each node has its own page cache which may contain some portion of the on-disk data. GFS2 uses a locking mechanism called glocks (pronounced gee-locks) to maintain the integrity of the cache between nodes. The glock subsystem provides a cache management function which is implemented using the distributed lock manager (DLM) as the underlying communication layer.
The glocks provide protection for the cache on a per-inode basis, so there is one lock per inode which is used for controlling the caching layer. If that glock is granted in shared mode (DLM lock mode: PR) then the data under that glock may be cached upon one or more nodes at the same time, so that all the nodes may have local access to the data.
If the glock is granted in exclusive mode (DLM lock mode: EX) then only a single node may cache the data under that glock. This mode is used by all operations which modify the data (such as the write system call).
If another node requests a glock which cannot be granted immediately, then the DLM sends a message to the node or nodes which currently hold the glocks blocking the new request to ask them to drop their locks. Dropping glocks can be (by the standards of most file system operations) a long process. Dropping a shared glock requires only that the cache be invalidated, which is relatively quick and proportional to the amount of cached data.
Dropping an exclusive glock requires a log flush, and writing back any changed data to disk, followed by the invalidation as per the shared glock.
Configuration of GFS2 File system
Step 1: Create GFS2 files system suing the following parameters
- clustname as Gurkulcluster
- gfs2fs as filesystem name
- /dev/vgCLUSTER/clvolume1 as device path
- number of device journals as 2
- locking mechanisam as lock_dlm
[root@gurkulclu-node1 ~]# mkfs.gfs2 -j 2 -p lock_dlm -t Gurkulcluster:gfs2fs /dev/vgCLUSTER/clvolume1
This will destroy any data on /dev/vgCLUSTER/clvolume1.
It appears to contain: symbolic link to `../dm-6′
Are you sure you want to proceed? [y/n] y
Device: /dev/vgCLUSTER/clvolume1
Blocksize: 4096
Device Size 0.49 GB (128000 blocks)
Filesystem Size: 0.49 GB (127997 blocks)
Journals: 2
Resource Groups: 2
Locking Protocol: “lock_dlm”
Lock Table: “Gurkulcluster:gfs2fs”
UUID: ee264aa4-6ab9-18ed-c994-eb577c28399a
Note: In the above output it is showing Resource Groups as 2 ,that means total 500M volume divided into two RGs of each ~25M each.
Size of RG information is important when you are planning for the filesystem expansion. You should minimum allocate atleast the size
of one RG, whenever you want to grow the filesystem using gfs2_grow.
Step 2 : Configure /etc/fstabl to mount the newly create gfs2 filesystem automatically during boot
[root@gurkulclu-node1 ~]# cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Thu Sep 5 14:06:56 2013
#
# Accessible filesystems, by reference, are maintained under ‘/dev/disk’
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/vg_gurkulclunode1-lv_root / ext4 defaults 1 1
UUID=7263f27d-afbc-4a07-bf85-4354fa0651f9 /boot ext4 defaults 1 2
/dev/mapper/vg_gurkulclunode1-lv_swap swap swap defaults 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/vgCLUSTER/clvolume1 /clvgfs gfs2 defaults 0 0
[root@gurkulclu-node1 ~]#
[root@gurkulclu-node1 ~]# mkdir /clvgfs
Step 3: Make sure GFS2 and DLM modules are already loaded in the kernel, if not load with modprobe command
[root@gurkulclu-node1 ~]# lsmod |grep gfs
gfs2 545168 2
dlm 148231 32 gfs2
configfs 29538 2 dlm
Step 4: Start the GFS2 Server
[root@gurkulclu-node1 ~]# service gfs2 restart
Mounting GFS2 filesystem (/clvgfs): invalid device path “/dev/vgCLUSTER/clvolume1” [FAILED]
— service Start up failed, so below is little troubleshooting to fix the problem
Troubleshooting check 1: :LVS doesn’t show volumes
[root@gurkulclu-node1 ~]# lvs
connect() failed on local socket: Connection refused
Internal cluster locking initialisation failed.
WARNING: Falling back to local file-based locking.
Volume Groups with the clustered attribute will be inaccessible.
Skipping clustered volume group vgCLUSTER
Skipping volume group vgCLUSTER
LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert
lv_root vg_gurkulclunode1 -wi-ao— 3.66g
lv_swap vg_gurkulclunode1 -wi-ao— 3.84g
Troubleshooting check 2: : There are no device path exists
[root@gurkulclu-node1 ~]# ls “/dev/vgCLUSTER/clvolume1”
ls: cannot access /dev/vgCLUSTER/clvolume1: No such file or directory
Troubleshooting check 3: : There are no Physical PVs exists
[root@gurkulclu-node1 ~]# pvs
connect() failed on local socket: Connection refused
Internal cluster locking initialisation failed.
WARNING: Falling back to local file-based locking.
Volume Groups with the clustered attribute will be inaccessible.
Skipping clustered volume group vgCLUSTER
Skipping volume group vgCLUSTER
PV VG Fmt Attr PSize PFree
/dev/vda2 vg_gurkulclunode1 lvm2 a– 7.51g 0
Troubleshooting check 4: : All that caused because of CLVMD not running , so I just started it.
[root@gurkulclu-node1 ~]# service clvmd start
Starting clvmd:
Activating VG(s): 1 logical volume(s) in volume group “vgCLUSTER” now active
2 logical volume(s) in volume group “vg_gurkulclunode1” now active [ OK ]
Troubleshooting check 5 : : Now All volumes, and PVS back online
[root@gurkulclu-node1 ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/mapper/1IET_00010002p1 vgCLUSTER lvm2 a– 780.00m 280.00m
/dev/vda2 vg_gurkulclunode1 lvm2 a– 7.51g 0
[root@gurkulclu-node1 ~]# lvs
LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert
clvolume1 vgCLUSTER -wi-a—- 500.00m
lv_root vg_gurkulclunode1 -wi-ao— 3.66g
lv_swap vg_gurkulclunode1 -wi-ao— 3.84g
[root@gurkulclu-node1 ~]# chkconfig clvmd on
Start the GFS2 service again, now the service started fine
[root@gurkulclu-node1 ~]# service gfs2 restart
Mounting GFS2 filesystem (/clvgfs): [ OK ]
Step 5: on grukulclu-node1, Check that the Volume Mounted properly.
[root@gurkulclu-node1 ~]# df -h /clvgfs
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vgCLUSTER-clvolume1
500M 259M 242M 52% /clvgfs
[root@gurkulclu-node1 ~]#
Step 6 : On gurkulclu-node2, make entries to /etc/fstab, GFS2 & DLM module, Start service and check the volume mount
[root@gurkulclu-node2 ~]# cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Thu Sep 5 14:20:11 2013
#
# Accessible filesystems, by reference, are maintained under ‘/dev/disk’
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/vg_gurkulclunode2-lv_root / ext4 defaults 1 1
UUID=18493b92-f8aa-4107-8e63-eaa75a8c3f01 /boot ext4 defaults 1 2
/dev/mapper/vg_gurkulclunode2-lv_swap swap swap defaults 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/vgCLUSTER/clvolume1 /clvgfs gfs2 defaults 0 0
[root@gurkulclu-node2 ~]#
[root@gurkulclu-node2 ~]# lsmod |grep gfs
gfs2 545168 0
dlm 148231 27 gfs2
configfs 29538 2 dlm
[root@gurkulclu-node2 ~]# service gfs2 restart
Mounting GFS2 filesystem (/clvgfs): [ OK ]
[root@gurkulclu-node2 ~]# df -h /clvgfs
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vgCLUSTER-clvolume1
500M 259M 242M 52% /clvgfs
Verifying the Functioning of GFS2
Step 1: On gurkulclu-node1,create a directory under /clvgfs
[root@gurkulclu-node1 clvgfs]# ls -lR /clvgfs
/clvgfs:
total 8
drwxr-xr-x. 2 root root 3864 Sep 21 15:07 test-gfs-fs-data
/clvgfs/test-gfs-fs-data:
total 24
-rw-r–r–. 1 root root 0 Sep 21 15:07 a
-rw-r–r–. 1 root root 0 Sep 21 15:07 b
-rw-r–r–. 1 root root 0 Sep 21 15:07 c
[root@gurkulclu-node1 clvgfs]#
Step 2: On, gurkulclu-node2, verify the content of /clvgfs with remount
[root@gurkulclu-node2 ~]# ls -lR /clvgfs
/clvgfs:
total 8
drwxr-xr-x. 2 root root 3864 Sep 21 15:07 test-gfs-fs-data
/clvgfs/test-gfs-fs-data:
total 24
-rw-r–r–. 1 root root 0 Sep 21 15:07 a
-rw-r–r–. 1 root root 0 Sep 21 15:07 b
-rw-r–r–. 1 root root 0 Sep 21 15:07 c
[root@gurkulclu-node2 ~]#
And , we see it them as it is. That concludes data visibility is real time on all the cluster nodes accessing the shared storage.
Quick Cheat sheet : Redhat Linux Supported File systems and the maximum supported file and file system size
Maximum (individual) file size
Filesystem | RHEL 3 | RHEL 4 | RHEL 5 | RHEL 6 |
---|---|---|---|---|
EXT2/3 | 2TB | 2TB | 2TB | 2TB |
EXT4 | n/a | n/a | 16TB (5.4 or later) [2] | 16TB |
GFS1 | 2TB | 16TB supported, 8EB limit | 16TB supported, 8EB maximum | n/a |
GFS2 | n/a | n/a | 25TB | 100TB |
XFS | n/a | n/a | 100TB | 100TB |
Maximum filesystem size
Filesystem | RHEL 3 | RHEL 4 | RHEL 5 | RHEL 6 |
---|---|---|---|---|
EXT2/3 | 2TB | 8TB | 16TB (8TB in 5.0) [4] | 16TB |
EXT4 | n/a | n/a | 16TB (5.4 or later) [2] | 16TB |
GFS | 2TB | 16TB supported, 8EB limit | 16TB supported, 8EB limit | n/a |
GFS2 | 2TB | 16TB | 25TB [5] | 100TB |
XFS | n/a | n/a | 100TB | 100TB |
Hi ramdev,
I have recently created a two node cluster, started drdb, clvm, and created gfs2 file system on it, when i create a file or directory in it, only first node show the contents, on second node it doesn’t show files. pl. guide