Distributed File Sytsem Essay, Research Paper
CODA is an experimental distributed file system being developed at Carnegie Mellon University. Dr. M. Satyanarayanan heads this group whose purpose is to design and implement a distributed file system that allows for transparent mobile computing in a client server environment. The goals set forth for CODA include disconnected operation for mobile clients, failure resistance, performance, scalability and security.
Most of the aspects of CODA’s basic architecture are inherited from its predecessor AFS (Andrew File System). Like AFS, CODA makes a distinction between servers and clients. The CODA workstation servers consist of partitions available to the file server. The partitions are made up of volumes that contain files. The volumes are set up like a directory structure, a root directory and the tree below it. A typical server would have hundreds of volumes of around 10MB each. The use of volumes helps to make large amounts of data more manageable and flexible. CODA stores volume and directory information along with access control lists and file attributes in raw partitions. These partitions are accessed through a log based recoverable virtual memory package (RVM). Only the file data is stored in the server partition files.
Each of the volumes has a name and an ID. Volumes can be mounted anywhere in /coda except for under existing directories. A new directory will be created as part of the mount process (the volume name cannot conflict with existing directories in order to eliminate confusion). CODA makes the mounting points invisible to the user; they appear as regular directories.
Files in CODA are identified by a FID (File identifier). The Fid consists of three 32 bit integers: the VolumeId, a VnodeId and an Uniquifier. The VolumeId identifies the volume the file resides in, the VnodeId is the inode number of the file and the Uniquifier guarantees no FID is ever used more than once.
CODA stores replicated copies of volumes on a group of servers for higher availability and reliability. The list of the servers that hold a copy of the replicated volume is the VSG or Volume Storage Group. Each replicated volume’s VolumeId is also replicated. The replicated VolumeIds keep the VSG and the local volumes of each member together.
The inner workings of a file system operation in CODA begin in much the same way they would in many other file systems. The client will request a file and a few system calls will be made in relation to the file, that is, the program will ask the kernel for service. The kernel will attempt to find the inode of the file and return a file handle associated with the file. The inode contains the information needed to access the file data and the file handle is used for the opening program. The call to open the file is handled by the Virtual File System (VFS) in the kernel, once the VFS realizes the requested file is in the coda file system the call is passed to Venus. Venus is CODA’s cache manager. IT will first check the client disk cache to see if the file is already in local cache. If the file resides in local cache it is read immediately from the local disk.
In the event the file is not in the client disk cache, Venus must contact the server for the file. The file must come from one of the VSG, but not all VSG are available to each client. This can be due to security considerations, downed server or a myriad of other reasons. Because of this each client has an Available Volume Storage Group (AVSG) that it can contact.
Clients also have a preferred server among the AVSG. The preferred server is chosen by factors such as physical proximity, load and/or server CPU power. Venus also verifies with the other AVSG that the preferred server has the latest copy of the requested data. If this is not true the data is fetched from the server with the latest copy and it is made the preferred server. Venus then notifies the AVSG that one (or more) of its members have stale data. The preferred server then initiates a callback. A callback is a promise made by the server that it will notify the client before allowing any other client to modify the data.
When Venus locates the file it responds to the kernel, which then returns the calling program from the system call. The first time the kernel passes on an open request for a file Venus will get the entire file from the server using remote procedure calls. The file is then stored in the cache area as a container file. It is now an ordinary file on the local disk and read-write operations are almost entirely handled by the local file system. Venus caches all the directory files and attributes. Venus allows operations on the file to proceed without contacting the servers as long as the requested files are present in the cache. When a file is modified and closed Venus updates the AVSG by sending them the new file through parallel remote procedure calls. Removal of or addition of files and symbolic links is also sent to the AVSG. This is one area where updating conflicts could arise. CODA uses an optimistic strategy to deal with these conflicts, which we will discuss later.
CODA allows for transparent disconnected operation. It achieves this through a three-phase process: hoarding, server emulation and reintegration. The hoarding phase occurs before disconnection and is actually a cache management phase. Hoarding attempts to capture immediate and future working-sets. Server emulation occurs during disconnection, it is where a pseudo-server emulates the semantic and security checks made by the real servers. It also keeps a detailed record of its actions in order to make the third phase quicker and more efficient. Reintegration is where the pseudo-server gets back up to date with the real servers. This includes conflict detection, automatic resolution and manual resolution when needed.
The purpose of hoarding is to allow for a future disconnected period. This must be done in order to minimize cache misses during disconnected operation. These misses would result in failure. The way to go about hoarding brings about more problems. The hoarding algorithm should maximize needs based on immediate and future working-sets, future voluntary disconnections, client cache size, and the user’s conditional utility function. The current algorithm is based on object priorities. The priorities have two components: component s represents how recently the object was used and the hoard priority (m), which represents the object’s expected future value to the user. The user can influence these by using the up front program named hoard for assigning hoard priorities, setting the horizon parameter (controls relative weighting of m and s) and setting the decay parameter (controls rate at which s decreases).
The purpose of server emulation is to provide the illusion of connectivity to the user and to allow for a smooth transition back into connected mode. To provide the illusion of connectivity the pseudo-server must precisely emulate the actions of the real server.
The client will request an object and a few system calls will be made just as they are in connected operation. The kernel’s VFS will then determine the file to be in the coda directory and the request will be passed off to Venus. Venus then checks the local cache for the file. If it is there it is read from the local disk, if not there is a cache miss. The call then fails and the illusion of connectivity is lost. It can be seen how this phase is heavily dependent on the hoarding phase. The second part of the server emulation phase is preparing for reconnection. The pseudo-server logs all actions that change or destroy a file onto the CML (Client Modification Log).
During reintegration all updates must be passed on to the server so they can become visible to the system. The technique is based in the replay of the pseudo-servers CML. This is where many conflicts could occur, but the replay algorithm detects the conflicts and attempts to resolve them.
Each modification made at a server is given a unique storeid (supplied by the client performing the operation). The storeid is made up of the client’s IP address and a monotonically increasing timestamp. An object’s chronological sequence of storeids is called its update history. Since it isn’t efficient to keep the entire update history of a replica, an approximation is kept at the server. The approximation is made up of the history’s length and the latest storied (LSID). All of the replication sites maintain an estimate of the approximate histories of each replica. A combination of all of these histories is called the coda version vector (CVV).
The update of an existing object has two phases. First each server in the AVSG checks the LSID and CVV given by the client and compares them to its own. The check succeeds if the client’s LSID and CVV are identical or if the client and servers LSIDs differ, but the client’s CVV is greater or equal to each of the server’s corresponding CVV elements (second is not used for directories). If the checks fail the user must resolve the inconsistency using the manual repair tool. If the checks succeed the server replaces its LSID and CVV with the client’s. In the second phase the CVV in all the ASVG are updated to show the client’s view, CODA uses the force operation to do this. The force is a server-to-server operation that logically replays the updates to the servers with stale data. Force can be initiated by Venus’s notification of inequality of CVVs, when the system decides a conflict can be resolved by using force or for server crash recovery.
The security system in CODA falls into two parts: 1) authentication and secure connections and 2) access control and protected databases. CODA has a mechanism for securely authorizing clients to servers, servers to clients and setting up encrypted channels between them. The CODA user’s password is a key element in this. The authentication and encryption systems are very close to those used by Kerberos 5. Since another member of the class is covering Kerberos I will be brief in its discussion.
The key points are shared secret cryptography, establishing secure connections and authentication protocol. The shared secret cryptography is achieved by assuming in numerous places that the server and client on a connection share a secret key allowing the sender to encrypt data and the receiver to decrypt it. There are three methods currently in place to acquire the key. The first is upon authentication the server and the client can share a password to establish a secure connection. CODA employs a program to look up the user name in /password and attempts a secure connection with the server using the client’s username as its ID and the password. It then executes a bind request with the server. The second is in the final stages of authentication the server can send the client a session key. The third involves Venus and the server using the session key from the authentication protocol to establish a secure connection. Establishing a secure connection requires the two parties to share the secret key and a public client ID.
As of right now the CODA developers are attempting to get Kerberos to better work with CODA. They are also attempting to plug a few other security holes. The most serious of these holes is the need for authenticated connections on the callbacks
CODA has been in use at Carnegie Melon University since the early 1990’s and over the years it has yet to lose any user data. At times servers have failed, but the replication of all volumes has insured against any data loss. A new disk is inserted and then updated in order to bring the server back up to date.
New uses for CODA are springing up all the time. CODA could be used in FTP mirror sites, in WWW replicated servers as well as many other applications. The file system has now been ported to work on different platforms. So far the group has been successful in porting to Unix and Linux. They are also well on there way to completing the porting to Windows 95. Over the next few years the easily available, portable and free distributed file system that allows for mobile computing could become a useful tool in the expanding computer market.