2012年4月12日 星期四

Efficient SQL Statements

1. Don't "select *"

2. Use "DECODE" to avoid scan same records

3. Delete data by ROWID (best efficient)

4. "TRUNCATE", not "DELETE"

5. Calculate rows by calculating index column, like COUNT(EMPNO)

6. Where instead of "HAVING"
(X) SELECT REGION, AVG (LOG_SIZE) FROM LOCATION GROUP BY REGION HAVING REGION REGION != ‘SYDNEY’ AND REGION != ‘PERTH’
(O) SELECT REGION, AVG (LOG_SIZE) FROM LOCATION WHERE REGION REGION != ‘SYDNEY’ ND REGION != ‘PERTH’ GROUP BY REGION

7. EXISTS instead of "IN"
(X) SELECT * FROM EMP WHERE EMPNO 》 0 AND DEPTNO IN ( SELECT DEPTNO FROM DEPT WHERE LOC = ‘MELB’)
(O) SELECT * FROM EMP WHERE EMPNO 》 0 AND EXISTS ( SELECT ‘X’ FROM DEPT WHERE DEPT.DEPTNO = EMP.DEPTNO AND LOC = ‘MELB’)

8. "NOT EXISTS" instead of "NOT IN"
(X) SELECT …FROM EMP WHERE DEPT_NO NOT IN (SELECT DEPT_NO FROM DEPT WHERE DEPT_CAT=’A’);
(O) SELECT … FROM EMP A,DEPT B WHERE A.DEPT_NO = B.DEPT(+) AND B.DEPT_NO IS NULL AND B.DEPT_CAT(+) = ‘A’
*(O) SELECT … FROM EMP E WHERE NOT EXISTS ( SELECT ‘X’ FROM DEPT D WHERE D.DEPT_NO = E.DEPT_NO AND DEPT_CAT = ‘A’);

9. "EXISTS" instead of "DISTINCT"
(X) SELECT DISTINCT DEPT_NO,DEPT_NAME FROM DEPT D,EMP E WHERE D.DEPT_NO = E.DEPT_NO
(O) SELECT DEPT_NO,DEPT_NAME FROM DEPT D WHERE EXISTS ( SELECT ‘X’ FROM EMP E WHERE E.DEPT_NO = D.DEPT_NO);

10. Execution by index. * Restructure index

11. Don't perform a function on the indexed colum
(X) SELECT … FROM DEPT WHERE SAL * 12 》 25000;
(O) SELECT … FROM DEPT WHERE SAL 》 25000/12;

12. Don't perform mathematical operations on the indexed column
(X) SELECT ... FROM SALARY WHERE salary + 1 = 10001
(O) SELECT ... FROM SALARY WHERE salary = 10000

13. Don't concatenate a column
(X) SELECT ... FROM EMPLOYEE WHERE firstname || ' ' || lastname = 'JOHN JONES'
(O) SELECT ... FROM EMPLOYEE WHERE firstname = 'JOHN' and lastname = 'JONES'

*14. Include the first column of a concatenated index in the WHERE clause of your statement

15. 》= instead of 》
Suggested DEPTNO is a index column.
(X) SELECT * FROM EMP WHERE DEPTNO 》=4
(O) SELECT * FROM EMP WHERE DEPTNO 》3

16. Don't use 'OR' statements to confuse the Cost Based Optimizer (CBO). It will even ignore optimizer hints in this situation. The only way of guaranteeing the use of indexes in these situations is to use an INDEX hint.

Oracle automatic statistics gathering

STATISTICS rules
1. 10g and above: Oracle statistics saved in data dictionary, and optimizer chooses the best execution plan for every SQL based on statistics, by CBO (Cost-BasedOptimization) only.
2. 9g: Oracle chooses execution plan by RBO(Rule-BasedOptimization)or CBO.


1. Cost in execution plan is summation of CPU and I/O as default.
2. Parameter Optimizer_mode to control optimizer
2.1. first_rows: The optimizer uses a mix of costs and heuristics to find a best plan for fast delivery of the first few rows.
2.2. all_rows: Default. The optimizer uses a cost-based approach for all SQL statements in the session and optimizes with a goal of best throughput (minimum resource use to complete the entire statement)
2.3. first_rows_n: The optimizer uses a cost-based approach and optimizes with a goal of best response time to return the first n rows (where n = 1, 10, 100, 1000)
2.4. rule, choose: only for 9i.
3. Parameter OPTIMIZER_INDEX_COST_ADJ: OPTIMIZER_INDEX_COST_ADJ lets you tune optimizer behavior for access path selection to be more or less index friendly—that is, to make the optimizer more or less prone to selecting an index access path over a full table scan.
3.1. rate of cost of MultiBlock I/O (table scan) and Single-block I/O (index scan). Oracle identifies OPTIMIZER_INDEX_COST_ADJ to affect CBO.
3.2. OPTIMIZER_INDEX_COST_ADJ less, table full scan
3.3. The default for this parameter is 100 percent, at which the optimizer evaluates index access paths at the regular cost.



STATISTICS
1. Table statistics:Number of rows, Number of blocks, Average row length;
2. Column statistics:Number of distinct values (NDV) in column, Number of nulls in column, Data distribution (histogram);
3. Index statistics:Number of leaf blocks, Levels, Clustering factor;
4. System statistics:I/O performance and utilization、CPU performance and utilization。==>only can be gathered manually


STATISTIC_LEVEL
statistics, reporting status information controlled by parameters for STATISTIC_LEVEL can be viewed in V$STATISTICS_LEVEL


Statistics not for...
1.tables being deleted or truncated in session and re-create
==>set statistics for these tables NULL, then Oracle will dynamically gather statistics a part of by optimized SQL
==>fix typical statistics for most representative table situation and take as optimized statistics
2.heavy-loading tables being inserted or updated largely in session with more than 10% modified
==>manually gather statistics after large modification as SQL procedure or database operation


MUST gather statistics manually
1. External table
2. Automatic statistics gathering stopped
3. system statistics
4. fixed objects, like dynamic performance tables



difference between 11g and 10g
1. Gather statistics with same options for all objects in 10g; options for different gathering can be setup separately for global, database, schema, and table
2. New options
2.1. PUBLISH: whether publish statistics to data dictionary after gathering
2.2. INCREMENTAL: incrementally gather global statistics for partition table
2.3. STALE_PERCENT: modified percentage(default 10%) of statistics for certain object is outdated

11 LEVELS for DYNAMIC SAMPLING
http://download.oracle.com/docs/cd/E11882_01/server.112/e10821/stats.htm#PFGRF94760

DYNAMIC SAMPLING DISADVANTAGES
1. Selection bias for large database
2. Taking resource, not suitable for OLTP, repeated execution, bound variables, few hard analysis

DYNAMIC SAMPLING ADVANTAGES
1. Suitable for data warehouse or OLAP, since SQL execution costs much more than SQL parsing
2. Support CBO for more information
3. CBO only can gather statistics for global temporary table by dynamic sampling
4. Do statistics for different rows.













http://space.itpub.net/35489/viewspace-608036