Search in Oracle DBA

Monday, July 12, 2010

About Hash Clusters

About Hash Clusters


Storing a table in a hash cluster is an optional way to improve the performance of data retrieval. A hash cluster provides an alternative to a nonclustered table with an index or an index cluster. With an indexed table or index cluster, Oracle Database locates the rows in a table using key values that the database stores in a separate index. To use hashing, you create a hash cluster and load tables into it. The database physically stores the rows of a table in a hash cluster and retrieves them according to the results of a hash function.

Oracle Database uses a hash function to generate a distribution of numeric values, called hash values, that are based on specific cluster key values. The key of a hash cluster, like the key of an index cluster, can be a single column or composite key (multiple column key). To find or store a row in a hash cluster, the database applies the hash function to the cluster key value of the row. The resulting hash value corresponds to a data block in the cluster, which the database then reads or writes on behalf of the issued statement.

To find or store a row in an indexed table or cluster, a minimum of two (there are usually more) I/Os must be performed:

•One or more I/Os to find or store the key value in the index

•Another I/O to read or write the row in the table or cluster


In contrast, the database uses a hash function to locate a row in a hash cluster; no I/O is required. As a result, a minimum of one I/O operation is necessary to read or write a row in a hash cluster.

When to Use Hash Clusters


This section helps you decide when to use hash clusters by contrasting situations where hashing is most useful against situations where there is no advantage. If you find your decision is to use indexing rather than hashing, then you should consider whether to store a table individually or as part of a cluster.


Situations Where Hashing Is Useful


Hashing is useful when you have the following conditions:



•Most queries are equality queries on the cluster key:



SELECT ... WHERE cluster_key = ...;



In such cases, the cluster key in the equality condition is hashed, and the corresponding hash key is usually found with a single read. In comparison, for an indexed table the key value must first be found in the index (usually several reads), and then the row is read from the table (another read).



•The tables in the hash cluster are primarily static in size so that you can determine the number of rows and amount of space required for the tables in the cluster. If tables in a hash cluster require more space than the initial allocation for the cluster, performance degradation can be substantial because overflow blocks are required.

Situations Where Hashing Is Not Advantageous


Hashing is not advantageous in the following situations:



•Most queries on the table retrieve rows over a range of cluster key values. For example, in full table scans or queries such as the following, a hash function cannot be used to determine the location of specific hash keys. Instead, the equivalent of a full table scan must be done to fetch the rows for the query.



SELECT . . . WHERE cluster_key < . . . ;



With an index, key values are ordered in the index, so cluster key values that satisfy the WHERE clause of a query can be found with relatively few I/Os.

•The table is not static, but instead is continually growing. If a table grows without limit, the space required over the life of the table (its cluster) cannot be predetermined.

•Applications frequently perform full-table scans on the table and the table is sparsely populated. A full-table scan in this situation takes longer under hashing.

•You cannot afford to preallocate the space that the hash cluster will eventually need.

Creating Hash ClustersA hash cluster is created using a CREATE CLUSTER statement, but you specify a HASHKEYS clause. The following example contains a statement to create a cluster named trial_cluster that stores the trial table, clustered by the trialno column (the cluster key); and another statement creating a table in the cluster.




CREATE CLUSTER trial_cluster (trialno NUMBER(5,0))

PCTUSED 80 PCTFREE 5

TABLESPACE users

STORAGE (INITIAL 250K NEXT 50K

MINEXTENTS 1 MAXEXTENTS 3

PCTINCREASE 0)

HASH IS trialno HASHKEYS 150;



CREATE TABLE trial (

trialno NUMBER(5,0) PRIMARY KEY,

...)

CLUSTER trial_cluster (trialno);



As with index clusters, the key of a hash cluster can be a single column or a composite key (multiple column key). In this example, it is a single column.



The HASHKEYS value, in this case 150, specifies and limits the number of unique hash values that can be generated by the hash function used by the cluster. The database rounds the number specified to the nearest prime number.



If no HASH IS clause is specified, the database uses an internal hash function. If the cluster key is already a unique identifier that is uniformly distributed over its range, you can bypass the internal hash function and specify the cluster key as the hash value, as is the case in the preceding example. You can also use the HASH IS clause to specify a user-defined hash function.
You cannot create a cluster index on a hash cluster, and you need not create an index on a hash cluster key.

For additional information about creating tables in a cluster, guidelines for setting parameters of the CREATE CLUSTER statement common to index and hash clusters, and the privileges required to create any cluster, see Chapter 17, " Managing Clusters". The following sections explain and provide guidelines for setting the parameters of the CREATE CLUSTER statement specific to hash clusters:

Creating a Sorted Hash Cluster


In a sorted hash cluster, the rows corresponding to each value of the hash function are sorted on a specified set of columns in ascending order, which can improve response time during subsequent operations on the clustered data.



For example, a telecommunications company needs to store detailed call records for a fixed number of originating telephone numbers through a telecommunications switch. From each originating telephone number there can be an unlimited number of telephone calls.



Calls are stored as they are made and processed later in first-in, first-out order (FIFO) when bills are generated for each originating telephone number. Each call has a detailed call record that is identified by a timestamp. The data that is gathered is similar to the following:

In the following SQL statements, the telephone_number column is the hash key. The hash cluster is sorted on the call_timestamp and call_duration columns. The number of hash keys is based on 10-digit telephone numbers.




CREATE CLUSTER call_detail_cluster (

telephone_number NUMBER,

call_timestamp NUMBER SORT,

call_duration NUMBER SORT )

HASHKEYS 10000 HASH IS telephone_number

SIZE 256;



CREATE TABLE call_detail (

telephone_number NUMBER,

call_timestamp NUMBER SORT,

call_duration NUMBER SORT,

other_info VARCHAR2(30) )

CLUSTER call_detail_cluster (

telephone_number, call_timestamp, call_duration );



Given the sort order of the data, the following query would return the call records for a specified hash key by oldest record first.



SELECT * WHERE telephone_number = 6505551212;

Creating Single-Table Hash ClustersYou can also create a single-table hash cluster, which provides fast access to rows in a table. However, this table must be the only table in the hash cluster. Essentially, there must be a one-to-one mapping between hash keys and data rows. The following statement creates a single-table hash cluster named peanut with the cluster key variety:




CREATE CLUSTER peanut (variety NUMBER)

SIZE 512 SINGLE TABLE HASHKEYS 500;



The database rounds the HASHKEYS value up to the nearest prime number, so this cluster has a maximum of 503 hash key values, each of size 512 bytes. The SINGLE TABLE clause is valid only for hash clusters. HASHKEYS must also be specified.

Controlling Space Use Within a Hash Cluster


When creating a hash cluster, it is important to choose the cluster key correctly and set the HASH IS, SIZE, and HASHKEYS parameters so that performance and space use are optimal. The following guidelines describe how to set these parameters.



Choosing the Key

Choosing the correct cluster key is dependent on the most common types of queries issued against the clustered tables. For example, consider the emp table in a hash cluster. If queries often select rows by employee number, the empno column should be the cluster key. If queries often select rows by department number, the deptno column should be the cluster key. For hash clusters that contain a single table, the cluster key is typically the entire primary key of the contained table.



The key of a hash cluster, like that of an index cluster, can be a single column or a composite key (multiple column key). A hash cluster with a composite key must use the internal hash function of the database.



Setting HASH IS

Specify the HASH IS parameter only if the cluster key is a single column of the NUMBER datatype, and contains uniformly distributed integers. If these conditions apply, you can distribute rows in the cluster so that each unique cluster key value hashes, with no collisions (two cluster key values having the same hash value), to a unique hash value. If these conditions do not apply, omit this clause so that you use the internal hash function.



Setting SIZE

SIZE should be set to the average amount of space required to hold all rows for any given hash key. Therefore, to properly determine SIZE, you must be aware of the characteristics of your data:



•If the hash cluster is to contain only a single table and the hash key values of the rows in that table are unique (one row for each value), SIZE can be set to the average row size in the cluster.



•If the hash cluster is to contain multiple tables, SIZE can be set to the average amount of space required to hold all rows associated with a representative hash value.



Further, once you have determined a (preliminary) value for SIZE, consider the following. If the SIZE value is small (more than four hash keys can be assigned for each data block) you can use this value for SIZE in the CREATE CLUSTER statement. However, if the value of SIZE is large (four or fewer hash keys can be assigned for each data block), then you should also consider the expected frequency of collisions and whether performance of data retrieval or efficiency of space usage is more important to you.



•If the hash cluster does not use the internal hash function (if you specified HASH IS) and you expect few or no collisions, you can use your preliminary value of SIZE. No collisions occur and space is used as efficiently as possible.



•If you expect frequent collisions on inserts, the likelihood of overflow blocks being allocated to store rows is high. To reduce the possibility of overflow blocks and maximize performance when collisions are frequent, you should adjust SIZE as shown in the following chart.




Overestimating the value of SIZE increases the amount of unused space in the cluster. If space efficiency is more important than the performance of data retrieval, disregard the adjustments shown in the preceding table and use the original value for SIZE.

Setting HASHKEYS


For maximum distribution of rows in a hash cluster, the database rounds the HASHKEYS value up to the nearest prime number.



Controlling Space in Hash Clusters

The following examples show how to correctly choose the cluster key and set the HASH IS, SIZE, and HASHKEYS parameters. For all examples, assume that the data block size is 2K and that on average, 1950 bytes of each block is available data space (block size minus overhead).



Controlling Space in Hash Clusters: Example 1

You decide to load the emp table into a hash cluster. Most queries retrieve employee records by their employee number. You estimate that the maximum number of rows in the emp table at any given time is 10000 and that the average row size is 55 bytes.



In this case, empno should be the cluster key. Because this column contains integers that are unique, the internal hash function can be bypassed. SIZE can be set to the average row size, 55 bytes. Note that 34 hash keys are assigned for each data block. HASHKEYS can be set to the number of rows in the table, 10000. The database rounds this value up to the next highest prime number: 10007.



CREATE CLUSTER emp_cluster (empno

NUMBER)

. . .

SIZE 55

HASH IS empno HASHKEYS 10000;

Controlling Space in Hash Clusters: Example 2

Conditions similar to the previous example exist. In this case, however, rows are usually retrieved by department number. At most, there are 1000 departments with an average of 10 employees for each department. Department numbers increment by 10 (0, 10, 20, 30, . . . ).



In this case, deptno should be the cluster key. Since this column contains integers that are uniformly distributed, the internal hash function can be bypassed. A preliminary value of SIZE (the average amount of space required to hold all rows for each department) is 55 bytes * 10, or 550 bytes. Using this value for SIZE, only three hash keys can be assigned for each data block. If you expect some collisions and want maximum performance of data retrieval, slightly alter your estimated SIZE to prevent collisions from requiring overflow blocks. By adjusting SIZE by 12%, to 620 bytes (refer to "Setting SIZE"), there is more space for rows from expected collisions.



HASHKEYS can be set to the number of unique department numbers, 1000. The database rounds this value up to the next highest prime number: 1009.



CREATE CLUSTER emp_cluster (deptno NUMBER)

. . .

SIZE 620

HASH IS deptno HASHKEYS 1000;

Estimating Size Required by Hash Clusters

As with index clusters, it is important to estimate the storage required for the data in a hash cluster.



Oracle Database guarantees that the initial allocation of space is sufficient to store the hash table according to the settings SIZE and HASHKEYS. If settings for the storage parameters INITIAL, NEXT, and MINEXTENTS do not account for the hash table size, incremental (additional) extents are allocated until at least SIZE*HASHKEYS is reached. For example, assume that the data block size is 2K, the available data space for each block is approximately 1900 bytes (data block size minus overhead), and that the STORAGE and HASH parameters are specified in the CREATE CLUSTER statement as follows:



STORAGE (INITIAL 100K

NEXT 150K

MINEXTENTS 1

PCTINCREASE 0)

SIZE 1500

HASHKEYS 100



In this example, only one hash key can be assigned for each data block. Therefore, the initial space required for the hash cluster is at least 100*2K or 200K. The settings for the storage parameters do not account for this requirement. Therefore, an initial extent of 100K and a second extent of 150K are allocated to the hash cluster.



Alternatively, assume the HASH parameters are specified as follows:



SIZE 500 HASHKEYS 100



In this case, three hash keys are assigned to each data block. Therefore, the initial space required for the hash cluster is at least 34*2K or 68K. The initial settings for the storage parameters are sufficient for this requirement (an initial extent of 100K is allocated to the hash cluster).



Altering Hash Clusters

You can alter a hash cluster with the ALTER CLUSTER statement:



ALTER CLUSTER emp_dept . . . ;



The implications for altering a hash cluster are identical to those for altering an index cluster, described in "Altering Clusters". However, the SIZE, HASHKEYS, and HASH IS parameters cannot be specified in an ALTER CLUSTER statement. To change these parameters, you must re-create the cluster, then copy the data from the original cluster.



Dropping Hash Clusters

You can drop a hash cluster using the DROP CLUSTER statement:



DROP CLUSTER emp_dept;



A table in a hash cluster is dropped using the DROP TABLE statement. The implications of dropping hash clusters and tables in hash clusters are the same as those for dropping index clusters.

Viewing Information About Hash Clusters


The following views display information about hash clusters:

 

What is a cancel-based recovery?

A cancel-based recovery is a type of user-managed incomplete recovery that is performed by specifying the UNTIL CANCEL clause with the RECOVER command (a SQL*Plus command that is used to recover a database). The UNTIL CANCEL clause specifies that the recovery process will continue until the user manually cancels the recovery process issuing the CANCEL command.

In a cancel-based incomplete recovery, the recovery process proceeds by prompting the user with the suggested archived redo log files’ names. The recovery process stops when the user specifies CANCEL instead of specifying an archived redo log file’s name. If the user does not specify CANCEL, the recovery process automatically stops when all the archived redo log files have been applied to the database.

A cancel-based recovery is usually performed when the requirement is to recover up to a particular archived redo log file. For example, if one of the archived redo log files required for the complete recovery is corrupt or missing, the only option is to recover up to the missing archived redo log file

What is time-based recovery?

Time-based recovery is a recovery technique, using which a database is recovered up to a specified time. For example, if a user mistakenly drops a table, the database administrator will be required to perform an incomplete recovery up to a point in time just before the user dropped the table. The UNTIL TIME clause is used with the RECOVER command in order to specify the time up to which the database is to be recovered. A time-based recovery can be performed manually as well as by using RMAN.

Wednesday, May 19, 2010

Oracle Memory Architecture

Instance(Memory): Instance is a combination of memory structure and process structure. Memory structure is SGA and Process structure is background processes.




System Global Area (SGA)

A system global area (SGA) is a group of shared memory structures that contain data and control information for one Oracle database instance. If multiple users are concurrently connected to the same instance, the data in the instance's SGA is "shared" among the users. Consequently, the SGA is sometimes referred to as the "shared global area".

Oracle automatically allocates memory for an SGA when you start an instance and the operating system reclaims the memory when you shut down the instance. Each instance has its own SGA.

The SGA is read-write; all users connected to a multiple-process database instance may read information contained within the instance's SGA, and several processes write to the SGA during execution of Oracle.

The SGA contains the following data structures:

• the database buffer cache
• the redo log buffer
• the shared pool
• the large pool (optional)
• the java pool (optional)
• the stream pool (optional)
• the data dictionary cache
• other miscellaneous information

Part of the SGA contains general information about the state of the database and the instance, which the background processes need to access; this is called the fixed SGA. No user data is stored here. The SGA also includes information communicated between processes, such as locking information.

The Database Buffer Cache

The database buffer cache is the portion of the SGA that holds copies of data blocks read from datafiles. All user processes concurrently connected to the instance share access to the database buffer cache.

The database buffer cache and the shared SQL cache are logically segmented into multiple sets. This organization into multiple sets reduces contention on multiprocessor systems.



Multiple Buffer Pools (Keep, Recycle Buffer)

You can configure the database buffer cache with separate buffer pools that either keep data in the buffer cache or make the buffers available for new data immediately after using the data blocks. Particular schema objects (tables, clusters, indexes, and partitions) can then be assigned to the appropriate buffer pool to control the way their data blocks age out of the cache.

• The KEEP buffer pool retains the schema object's data blocks in memory.

• The RECYCLE buffer pool eliminates data blocks from memory as soon as they are no longer needed.

• The DEFAULT buffer pool contains data blocks from schema objects that are not assigned to any buffer pool, as well as schema objects that are explicitly assigned to the DEFAULT pool.

The initialization parameters that configure the KEEP and RECYCLE buffer pools are BUFFER_POOL_KEEP and BUFFER_POOL_RECYCLE.



The Redo Log Buffer

The redo log buffer is a circular buffer in the SGA that holds information about changes made to the database. This information is stored in redo entries. Redo entries contain the information necessary to reconstruct, or redo, changes made to the database by INSERT, UPDATE, DELETE, CREATE, ALTER, or DROP operations. Redo entries are used for database recovery, if necessary.

Redo entries are copied by Oracle server processes from the user's memory space to the redo log buffer in the SGA. The redo entries take up continuous, sequential space in the buffer. The background process LGWR writes the redo log buffer to the active online redo log file (or group of files) on disk.

The initialization parameter LOG_BUFFER determines the size (in bytes) of the redo log buffer. In general, larger values reduce log file I/O, particularly if transactions are long or numerous. The default setting is four times the maximum data block size for the host operating system.

The Shared Pool


The shared pool portion of the SGA contains three major areas: library cache, dictionary cache, and control structures. Figure  shows the contents of the shared pool.



The total size of the shared pool is determined by the initialization parameter SHARED_POOL_SIZE. The default value of this parameter is 3,500,000 bytes. Increasing the value of this parameter increases the amount of memory reserved for the shared pool, and therefore increases the space reserved for shared SQL areas.

Library Cache


The library cache includes the shared SQL areas, private SQL areas, PL/SQL procedures and packages, and control structures such as locks and library cache handles.



Shared SQL areas must be available to multiple users, so the library cache is contained in the shared pool within the SGA. The size of the library cache (along with the size of the data dictionary cache) is limited by the size of the shared pool.

Shared SQL Areas and Private SQL Areas


Oracle represents each SQL statement it executes with a shared SQL area and a private SQL area. Oracle recognizes when two users are executing the same SQL statement and reuses the shared SQL area for those users. However, each user must have a separate copy of the statement's private SQL area.

Shared SQL Areas


A shared SQL area contains the parse tree and execution plan for a single SQL statement, or for identical SQL statements. Oracle saves memory by using one shared SQL area for multiple identical DML statements, particularly when many users execute the same application. A shared SQL area is always in the shared pool.