Geo-Replication in Gluster



What is geo-replication?
It is as simple as having a copy of your data, somewhere else in the globe!

Data from a location, is asynchronously replicated to a secondary location so that the same data exists in both the locations.

Thus, there exists a backup, all the time, even if the data in the primary location is completely destroyed. Same data from secondary location/locations can always be retrived!

Cool!

But how does this work?
Let me explain with respect to glusterFS.

It all happens this way:
1) Lets say, you have a volume (I will call it primary volume)
2) Now you want to have it replicated.
3) So, you will create a new volume  in a different location. (I will call it  a secondary volume)
4) You want to replicate all the data from primary volume, also sync data to secondary volume whenever new changes are made to primary volume.       
5) So, you will create a gluster geo-replication session which takes care of all these replications for you:)

The primary volume is called master, from where the unidirectional sync happens to the secondary volume(the slave).

How is the connection established between the master and the slave?
Click here to know...

How is the changes in master detected?
To learn this, you should know what is xtime and stime.

stime is an extended attribue stored in each master brick root which stores the timestamp until which the changelogs are completely synced to slave.

Wondering what is changelog? Read about it here.
 
xtime is an extended attribute stored on all the file system objects, both files and directories. To each modification in a file or directory in the file system, an extended attribute with modification time is stored on it.

Gluster geo-replication detects the changes made on master by crawling the master volume.
There occur, three types of crawls:

1) Hybrid crawl/ Xsync Crawl:
It only happens when there is already data in master volume before setting up  geo-replication.

If geo-rep is setup before creation of data on master, it never goes to hybrid crawl.

On each directory, it compares stime with xtime and only enters inside if
stime < xtime ( i.e last time the data got synced on slave is lesser than the last time any modification was made on master!)

2) History crawl:
Lets assume geo-rep is in stopped state and there is one month of data pending to be synced to slave. stime would have been marked until which geo-rep is synced.
when  geo-rep is started after a month, all the changelogs are processed, this phase is essentially called as history crawl.

3) Changelog crawl:
When live changelogs or close to live changelogs are processed to sync the data to slave, we call it changelog crawl.


How does the data sync from master to slave? 
At first we see Entry Syncing, where a zero byte file is created in slave with same gfid as master.

Here, gfid is nothing but a unique identifier which identifies file in gluster volume which is similar to inode number in file system.

And then we see Data syncing.
Data syncing includes:
  • Rsync over ssh  (default method)
  • Tar  over ssh 
    (one among these can be opted for syncing the data)

    Hmm.. how are rsync and tar over ssh defferent?
    The first backup with rsync will be slow or will be similar to tar over ssh because all files are copied. Subsequently, when the new changes are made on master volume, only new changes are copied to slave, making the replication faster and efficient. 
    Thus it is recommended to use rsync for large files.

    Tar is an archive utility. Tar ball is created for all the files, unlike rsync which only considers deltas once you copy the initial file in secondary location.
    Thus it is recommended to use tar for small files.

    How frequently the sync occur?
    As I said it is asynchronous. Although the replication process may occur in near-
    real-time, it is more common for replication to occur on a scheduled basis.

    A few changelogs will be  batched together and are processed.
    • First, the entry and metadata changes are synced serially from each changelog of the batch. 
    • Data changes gets added to rsync/tarssh queue as soon as the entry and metadata are synced from first changelog of the batch. 
    • Once all the entry, meta data and data are synced within batch, the stime is updated in brick root. 
    • If something breaks in between re-processing is involved.
    Now as you have come this far, I will tell you about Geo-rep processes. These  processes efficiently take up above mentioned tasks.

    There are three precesses, namely,
    1) Monitor process
    2) worker process
    3) Agent process
     
    There is a file called gsyncd.py  inside glusterfs repository.
    It acts as a single entry point for geo-replication.

    Based on different arguments passed, it acts as moniter, worker or agent.

    When we start the geo-rep session,using the command:
    #gluster volume geo-rep <mastervol><slavehost>::<slavevol> start

    gluster management daemon will start the monitor process.

    This process monitors, worker and agent processes. If worker/agent crashes, monitor restarts them. (Note: if the agent crashes, worker automatically crashes and vice versa)

    There will be one moniter process per geo-rep session in a node.

    The agent process consumes changelogs generated by changelog xlator.
    Worker process uses parsed changelogs generated by agent process to sync the data to slave.

    Each brick will have a agent and a worker process.

    Let us try it out:
    Let us create our own geo-rep session, following the instructions mentioned in this blog

    Let us see if data is really replicated from master to slave :p

    How do we do that?
    Mount master and slave to mount points,
    using this command.

    # mount -t glusterfs `hostname`:`volume-name` /mount_point
    eg:   #mount -t glusterfs virtual_machine1:master /dir1/mount_point_master
            #mount -t glusterfs virtual_machine2:slave /dir2/mount_point_slave

     create files inside mount point of master.
    #cd  /dir1/mount_point_master
    #touch file{1..10}

    Check inside slave mount point now.
    #ls /dir2/mount_point_slave
    See that file1 to file10 are already replicated :)

    also check log files, to know more details:
    All logs can be found at:
    /var/log/glusterfs/

    geo-rep specific logs are present at:
    /var/log/glusterfs/geo-replication/

    But is there any mechanism for data retrival?
    Yes! If the master node gets detroyed or goes offline, we can promote our slave as master! All the data access can be done from the promoted volume(the new master volume). We call this procedure as  failover.

    See how...

    When the original master is back online, we can perform a failback procedure on the original slave so that it synchronizes the differences back to the original master.
     

    So this was it. Here ends my narration of geo-replication story:) Do explore more!.


    Comments

    Popular Posts