Enter this unique database structure and discover what makes Exchange work
Relational database technology lies at the heart of Microsoft Exchange Server. At its simplest, a relational database stores information in tables and uses matching values in the tables to relate information between the tables. When you understand Exchange Server's database technology, you can head off problems and optimize performance. Over the next two months, I'll take a detailed look at the databases that make up the Exchange Information Store. This month, I'll focus on the databases' internal structure, how they process transactions, and what happens when problems occur. Next month, I'll show you how to maintain the databases in the Information Store and maximize their performance.
Exchange Databases and the Information Store
Mailboxes, public folders, and directory information in Exchange are contained in three databases. The private information database holds user mailboxes, the public information database holds public folders, and the directory database holds directory and configuration data. One Windows NT service, the information store service (store.exe), manages the private and public information databases, which together are known as the Information Store. The directory and Information Store contain many files that are important to Exchange administrators. Table 1, page 170, lists some of these important files.
The Exchange database engine. Two versions of Microsoft's Jet database engine manage the Exchange databases. Jet Blue manages the databases in Exchange 4.0 and 5.0, and the Extensible Storage Engine (ESE or ESE97) manages the databases in Exchange 5.5. Microsoft designed both database engines to handle the transaction load Exchange's messaging system generates. Jet Blue and ESE run inside store.exe as either edb.dll or ese.dll, depending on the version of Exchange. The essential features of the Information Store do not change between Exchange 5.0 and 5.5. However, the ESE database engine in Exchange 5.5 is faster and more scalable than Jet Blue. When I refer to the Exchange database engine in this and next month's articles, I'm referring to the ESE engine in Exchange 5.5.
Exchange does not use SQL Server, Microsoft's best-known database, because SQL Server handles commercial transactions, which are usually consistent in form. Messages in Exchange vary in content and length, and this variation challenges a database. For example, some messages in Exchange go to single recipients and contain a few lines of text, whereas other messages go to many addresses and contain several pages of content and sometimes attachments. SQL Server's database design cannot process messages in Exchange effectively.
Single-Instance Storage Architecture
Exchange uses a single-instance storage model to process messages. Single-instance storage stores one copy of a message routed to multiple recipients in the Information Store and deposits a pointer to the message in each recipient's mailbox. The single-instance model is different from the classic LAN-based design for email, which sends separate copies of one message to each of the message's recipients. The LAN model works well for small installations because it doesn't incur the overhead of a database, but it doesn't function effectively if the number of users rises to more than 100.
Increasing the effectiveness of single-instance storage. Single-instance storage is server-specific. The Information Store transfers messages between Exchange mailboxes that reside on one server. The Exchange Mail Transfer Agent (MTA) transfers messages between the mailboxes on different Exchange servers. Transferring messages to users on multiple servers increases data duplication and network traffic, and it can hinder Exchange's scalability. You can optimize the efficiency of single-instance storage in Exchange by ensuring that all members of a workgroup or department have mailboxes on one server.
Relational Tables Inside the Private Information Store
The internal structure of the public and private information stores and the directory is similar to the structure of a classic relational database; that is, the structure consists of tables. Let's look at how these tables function and interact in the private information store, where Exchange processes email.
The most important tables in the private information store are:
- The mailbox table: One row in this table holds properties for each mailbox on a server.
- The folders table: Each folder in every mailbox has a row in this table.
- The message table: One row in this table holds content of every message.
- The attachments table: One row in this table holds the content of every attachment.
- A set of message/folder tables: Each folder has its own message/folder table.
Pointers link one table to another within the private information store. Single-instance storage is based on the interactions between pointers and tables, and these interactions let Exchange deliver a unified view of the private information store's contents to clients.
Exchange supports nested folders, in which folders contain subfolders. Client machines construct a tree view of the folders in the private information store by reading the data in the store's folders table. Each folder has a globally unique ID (GUID). Each subfolder has a GUID and also carries its parent folder's ID, which identifies the subfolder as a subfolder. The sample data in Table 2 shows that the Articles and Newsflash folders are both subfolders of the Magazine folder.
When an Outlook client opens a user's mailbox, the count of new items column in the folders table alerts the client to highlight the folder name in the user's Inbox. The highlighted folder name signals the user to review the contents of the folder.
Each folder table has its own message/folder table that contains header information (all of which are Messaging API--MAPI--properties) for all the messages in the folder. By maintaining a folder's message header information in a separate message/folder table, Exchange lets each folder sort its own messages. (Table 3 shows the contents of a message/folder table.) With message/folder tables, folders do not need to request individual message data from one large table. Exchange's message/folder table system minimizes data transmission between client and server when clients display information about a folder.
The message ID (another GUID) links each row in a message/folder table to message content. When a user selects an email item in the Inbox and double-clicks the item to read it, the message ID retrieves the message content from the message table and uses the combination of header and content information to populate the form that displays the complete message for the user. (Table 4 shows the type of data a message table contains.) Message body content is stored in Rich Text Format (RTF). If the message contains an attachment, Exchange places a pointer to the attachment in the Attachment Pointer field, and the client can use the pointer to retrieve the attachment. The Usage Count field contains a count of all the folders that contain a reference to a particular message. The folder count in the Usage Count field decreases as users delete references to the message. When the usage count reaches zero, Exchange removes the message row from the message table.
In Screen 1 you can see a typical email message. Exchange took data from the Information Store's relational tables and used the process just described to construct this familiar user interface.
The Exchange Transaction Model
In Exchange, all transactions consist of a series of multiple operations against different tables, and Exchange will not accept a transaction unless that series of operations is complete (such indivisible operations are known as "atomic operations"). Let's review what happens when a new email message is delivered to four users on one server.
First, Exchange reads the mailbox table to verify that mailboxes exist for each of the message recipients. Second, Exchange reads the folders table to locate the rows that correspond to each recipient's Inbox. Third, Exchange updates the message table by adding a new row that contains the content of the new message. Fourth, Exchange updates the message/folder table for each of the four Inbox folders, providing header information for the new message. Pointers link each message/folder table with the new message's row in the message table. If the new message contains attachments, Exchange adds rows corresponding to the attachments to the attachments table. Fifth, Exchange updates the Inbox row in the folders table for each user by adding the new message to the Count of Items and Count of New Items columns. At this point the transaction is complete.
Exchange saves each transaction by writing it to the current transaction log (edb.log), and also to a queue in memory. The Information Stores manipulates the contents of the memory queue to carry out the eventual writes to the database in the most efficient manner. For example, if system load is heavy, Exchange regularly updates the rows for Inbox folders in the folders table. By referring to the memory queue, Exchange can organize the transactions and commit changes for Inbox folders in one operation. Exchange always gives client interactions higher priority than background processing, so when system load is heavy, transactions can build up in the memory queue. Exchange then flushes the transactions from the memory queue when the load on the system decreases.
Client notification of a transaction occurs after the transaction is completed. MAPI users receive a remote procedure call (RPC) notification. Other users, such as those using Outlook Web Access or Post Office Protocol 3 (POP3), must check the Inbox at intervals to find new messages.
The Exchange Information Store is therefore composed of databases, transactions in memory, and logs of those transactions. For an administrator to focus exclusively on managing the databases would be a mistake. In an Exchange operational environment, you must manage the databases, transactions in memory, and transaction logs as a single entity.
In Exchange, transaction logs are a crucial element, the first port of call for any change an item stored in an Exchange database undergoes. If an error occurs and results in system memory loss, the data in the transaction logs lets Exchange recover all transactions that were not committed to the database at the time of the memory loss. Keeping transaction logs on the same disk as the Exchange databases is risky: If you lose your Exchange databases, you'll also lose the transaction logs that let you update a backup copy of the databases.
I am an Exchange administrator and server specialist at Tennessee Valley Authority. I have been an Exchange advocate since I took the Exchange Beta class several years ago. I helped complete a 5-week project to upgrade 33 Exchange AlphaServers (2100As and 1000As) with 13,000 mailboxes from NT 3.51 and Exchange 4.0 to NT 4.0 and Exchange 5.5. April’s article couldn’t have been better timed. In future articles, I’d like to see information about the EDBUTIL replacement (ESEUTIL) and articles that deal with clustering considerations.<br>
--Charles H. Chance<br><br>
<i>Thanks for the feedback on the article. I’m glad you liked it. The magazine publishes monthly features on Exchange (for information about back issues, go to the Web site at http://www.winntmag.com). Windows NT Magazine also publishes a monthly newsletter (Exchange Administrator, http://www.winntmag.com/exchange) that you might like to check out. “Exchange 5.5/E and Microsoft Cluster Server” (February) covers clustering. “Maintaining Exchange’s Information Store” (May) discusses the EDBUTIL and ESEUTIL utilities.<br>
--Tony Redmond</i>
Charles H. Chance August 10, 1999