Normalization Basics
Basically, Normalization is the process of organizing data into database efficiently. There are two goals of normalization:-
1- Eliminating redundant data
2- Ensuring data dependencies make sense
The database community has developed a series of guidelines for ensuring that data is normalized in database. These series are
referred to as normal forms and are numbered from 1NF to 5NF. In general practice, we used 1NF, 2NF and 3NF. Occasional we use 4NF. 5NF is very rarely seen and is out of scope this article.
First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database:
* Eliminate duplicate columns from the same table.
* Create separate tables for each group of related data and identify each row with a unique column or set of columns
Second Normal Form (2NF)
Second normal form (2NF) further addresses the concept of removing duplicative data:
* Meet all the requirements of the first normal form.
* Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
* Create relationships between these new tables and their predecessors through the use of foreign keys.
Third Normal Form (3NF)
Third normal form (3NF) goes one large step further:
* Meet all the requirements of the second normal form.
* Remove columns that are not dependent upon the primary key.
Fourth Normal Form (4NF)
Finally, fourth normal form (4NF) has one additional requirement:
* Meet all the requirements of the third normal form.
* A relation is in 4NF if it has no multi-valued dependencies.
Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database.
I think its enough for this article. I will update this article with practical example soon.
Feature of SQL Server 2008
Here is brief summary of the differences and I will try to explain why SQL Server 2008 is rocking.
1- Plug is model for SSMS:- SSMS 2005 also had a plug in model but this was not published.
2- Inline variable assignment:-
Instead of:
DECLARE @myVar int
SET @myVar = 5
you can do it in one line:
DECLARE @myVar int = 5
3- Filtered index:- it allows you to create an index while specifying what rows are not to be in the index. For example, index all rows where Status != null.
4- Intellisense:- in the SQL Server Management Studio (SSMS). This has been previously possible in SQL Server 2000 and 2005 with Intellisense use of 3rd party add-ins like SQL Prompt ($195). But these tools are a horrible hack at best.
Let's play with Trigger
1- Triggers do not accept parameter whereas procedure can.
2- A trigger is executed implicitly and to execute a procedure, it has to be explicitly called by user.
How to apply database triggers
A trigger has three basic parts:-
1- A triggering statement
2- A trigger restriction
3- A trigger action
Triggering Statement:-
It is a sql statement that causes a trigger to be fired. It can be insert, update or delete statement for a specific table.
Trigger restriction:-
A trigger restriction specifies a logical expression that must be TRUE for the trigger to fire. A trigger restriction specifies using WHEN clause.
Trigger Action:-
A trigger action code is executed when a triggering statement is encountered and any trigger restirction evaluates to true.
Type of triggers
Row Triggers
A row trigger is fired each time a row in the table is affected by the triggering statement. For example, if an update statement updates multiple rows of a table, a row trigger is fired once for each row affected by the update statement.
Statement Triggers
A statement trigger is fired once on behalf of the triggering statement, independent of the number of rows the triggering statement affects. Statement triggers should be used when a triggering statement affects rows in a table but the processing required is completely independent of the number of rows affected.
Before V/s After triggers
BEFORE triggers executes the trigger action before the triggering statement. These types of triggers are commonly used in the following situations:-
a) BEFORE triggers are used when the trigger action should determine wheather or not the triggering statement should be allowed to complete.
b) BEFORE triggers are used to derive specifc column values before completing triggering insert or update statement.
AFTER trigger executes the trigger action after the triggering statement is executed. These types of triggers are commonly used in the following situations:-
a) AFTER triggers are used when you want the triggering statement to complete before executing the trigger action.
b) If a BEFORE trigger is already present, an AFTER trigger can perform different actions on the same triggering statement.
The following example demonstrate how to create trigger that displayes the current system time when a row is inserted into the table.
SET NOCOUNT ON
CREATE TABLE Source (Sou_ID int IDENTITY, Sou_Desc varchar(10))
GO
CREATE TRIGGER tr_Source_INSERT
ON Source
FOR INSERT
AS
PRINT GETDATE()
GO
INSERT Source (Sou_Desc) VALUES ('Test 1')
-- Results --
Apr 17 2010 9:56AM
When to use triggers
We should use trigger when we need to perform a certain action as a result of an INSERT, UPDATE or DELETE is used.
CREATE TABLE Orders (Ord_ID int IDENTITY, Ord_Priority varchar(10))
GO
CREATE TRIGGER tr_Orders_INSERT
ON Orders
FOR INSERT
AS
IF (SELECT COUNT(*) FROM inserted WHERE Ord_Priority = 'High') = 1
BEGIN
PRINT 'Email Code Goes Here'
END
GO
INSERT Orders (Ord_Priority) VALUES ('High')
-- Results --
Email Code Goes Here
how to know database size using query
or you can try...
exec sp_spaceused
Stored Proceures Basics
Other than performance, stored procedures are also helpful because they provide another layer of abstraction for your web application. For instance, you can change a query in a stored procedure and get different results without having to recompile your objects. Using stored procedures also makes your objects cleaner, and SQL Server’s convenient backup tool makes it easy to back them up.
There are various options that can be used to create stored procedures. In these next few topics we will discuss creating a simple stored procedure to more advanced options that can be used when creating stored procedures.
Creating a simple stored procedure
To create a stored procedure to do this the code would look like this:
CREATE PROCEDURE dbo.USP_projName_modulename
AS SELECT * FROM dbo.tblName
GO
To call the procedure to return the contents from the table specified, the code would be:
EXEC USP_projName_modulename
--or just simply
USP_projName_modulename
When creating a stored procedure you can either use CREATE PROCEDURE or CREATE PROC. After the stored procedure name you need to use the keyword "AS" and then the rest is just the regular SQL code that you would normally execute.
One thing to note is that you cannot use the keyword "GO" in the stored procedure. Once the SQL Server compiler sees "GO" it assumes it is the end of the batch.
Stored procedure with parameters
The real power of stored procedures is the ability to pass parameters and have the stored procedure handle the differing requests that are made.
In this example we will query the address table from the master database, but instead of getting back all records we will limit it to just a particular city. This example assumes there will be an exact match on the City value that is passed.
CREATE PROCEDURE dbo.USP_abc_GetMembersBasedOnCity
(
@City nvarchar(30)
)
AS
SELECT *
FROM dbo.address
WHERE City = @City
GO
To call this stored procedure we would execute it as follows:
EXEC USP_abc_GetMembersBasedOnCity @City = 'Delhi'
In most cases it is always a good practice to pass in all parameter values, but sometimes it is not possible. So in this example we use the NULL option to allow you to not pass in a parameter value. If we create and run this stored procedure as is it will not return any data, because it is looking for any City values that equal NULL.
CREATE PROCEDURE dbo.USP_abc_GetMembersBasedOnCity @City nvarchar(30) = NULL
AS
SELECT *
FROM dbo.address
WHERE City = @City
GO
We could change this stored procedure and use the ISNULL function to get around this. So if a value is passed it will use the value to narrow the result set and if a value is not passed it will return all records.
CREATE PROCEDURE dbo.USP_abc_GetMembersBasedOnCity @City nvarchar(30) = NULL
AS
SELECT *
FROM dbo.address
WHERE City = ISNULL(@City,City)
GO
Above we discussed how to pass parameters inside SPs, but there is one other option, to pass parameter values backout from SP.
Setting up output paramters for a stored procedure is basically the same as setting up input parameters, the only difference is that you use the OUTPUT clause after the parameter name to specify that it should return a value. The output clause can be specified by either using the keyword "OUTPUT" or just "OUT".
CREATE PROCEDURE dbo.USP_abc_GetMembersBasedOnCity (
@City nvarchar(30),
@AddressCount int OUTPUT
)
AS
SELECT @AddressCount = count(*)
FROM address
WHERE City = @City
Go
To call this stored procedure we would execute it as follows. First we are going to declare a variable, execute the stored procedure and then select the returned valued.
DECLARE @AddressCount int
EXEC USP_abc_GetMembersBasedOnCity @City = 'Calgary', @AddressCount = @AddressCount OUTPUT
SELECT @AddressCount
Using try catch in SQL Server stored procedures
Doing error handling in SQL Server has not always been the easiest thing, so this option definitely makes it much easier to code for and handle errors.
If you are not familiar with the Try...Catch paradigm it is basically two blocks of code with your stored procedures that lets you execute some code, this is the Try section and if there are errors they are handled in the Catch section.
Let's take a look at an example of how this can be done. As you can see we are using a basic SELECT statement that is contained within the TRY section, but for some reason if this fails it will run the code in the CATCH section and return the error information.
CREATE PROCEDURE uspTryCatchTest
AS
BEGIN TRY
SELECT 1/0
END TRY
BEGIN CATCH
SELECT ERROR_NUMBER() AS ErrorNumber
,ERROR_SEVERITY() AS ErrorSeverity
,ERROR_STATE() AS ErrorState
,ERROR_PROCEDURE() AS ErrorProcedure
,ERROR_LINE() AS ErrorLine
,ERROR_MESSAGE() AS ErrorMessage;
END CATCH
Naming conventions for SQL Server stored procedures
1- One of the things you do not want to use as a standard is "sp_". This is a standard naming convention that is used in the master database. If you do not specify the database where the object is, SQL Server will first search the master database to see if the object exists there and then it will search the user database. So avoid using this as a naming convention.
I liked to first give the action that the stored procedure takes and then give it a name representing the object it will affect.
So based on the actions that you may take with a stored procedure, you may use:
* Insert
* Delete
* Update
* Select
* Get
* Validate
* etc...
So here are a few examples:
* USP_projName_personDelete
* USP_projName_personGetData
Reducing amount of network data for SQL Server stored procedures
There are many tricks that can be used when you write T-SQL code. One of these is to reduce the amount of network data for each statement that occurs within your stored procedures. Every time a SQL statement is executed it returns the number of rows that were affected. By using "SET NOCOUNT ON" within your stored procedure you can shut off these messages and reduce some of the traffic.
Not using SET NOCOUNT ON
CREATE PROCEDURE uspGetAddress @City nvarchar(30)
AS
SELECT *
FROM AdventureWorks.Person.Address
WHERE City = @City
GO
The messages that are returned would be similar to this:
20 rows affected
Using SET NOCOUNT ON
-- using SET NOCOUNT ON
CREATE PROCEDURE uspGetAddress @City nvarchar(30)
AS
SET NOCOUNT ON
SELECT *
FROM AdventureWorks.Person.Address
WHERE City = @City
GO
The messages that are returned would be similar to this:
Command(s) completed successfully.
Using SET NOCOUNT ON and @@ROWCOUNT
-- not using SET NOCOUNT ON
CREATE PROCEDURE uspGetAddress @City nvarchar(30)
AS
SET NOCOUNT ON
SELECT *
FROM AdventureWorks.Person.Address
WHERE City = @City
PRINT @@ROWCOUNT
GO
The messages that are returned would be similar to this:
20
Dropping Single Stored Procedure
To drop a single stored procedure you use the DROP PROCEDURE or DROP PROC command as follows.
DROP PROCEDURE uspGetAddress
Dropping Multiple Stored Procedures
DROP PROCEDURE uspGetAddress, uspInsertAddress, uspDeleteAddress
GO
Sql Server Views
Sql Server Views
Creates a virtual table that represents the data in one or more tables in an alternative way. CREATE VIEW must be the first statement in a query batch.
schema_name
Is the name of the schema to which the view belongs.
view_name
Is the name of the view.
Column
Is the name to be used for a column in a view. A column name is required only when a column is derived from an arithmetic expression, a function, or a constant.
AS
Specifies the actions the view is to perform.
select_statement
Is the SELECT statement that defines the view. The statement can use more than one table and other views. In an indexed view definition, the SELECT statement must be a single table statement or a multitable JOIN with optional aggregation.
The SELECT clauses in a view definition cannot include the following:
- COMPUTE or COMPUTE BY clauses
- An ORDER BY clause, unless there is also a TOP clause in the select list of the SELECT statement
- The INTO keyword
- The OPTION clause
- A reference to a temporary table or a table variable.
Syntax
CREATE VIEW view_name
[(column_name[,column_name]….)]
[WITH ENCRYPTION]
AS select_statement [WITH CHECK OPTION]
CREATE VIEW hiredate_view
AS
SELECT p.FirstName, p.LastName, e.BusinessEntityID, e.HireDate
FROM HumanResources.Employee e
JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID ;
GO
ENCRYPTION
Encrypts the entries in sys.syscomments that contain the text of the CREATE VIEW statement. Using WITH ENCRYPTION prevents the view from being published as part of SQL Server replication.
CREATE VIEW Purchasing.PurchaseOrderReject
WITH ENCRYPTION
AS
SELECT PurchaseOrderID, ReceivedQty, RejectedQty,
RejectedQty / ReceivedQty AS RejectRatio, DueDate
FROM Purchasing.PurchaseOrderDetail
WHERE RejectedQty / ReceivedQty > 0
AND DueDate > CONVERT(DATETIME,'20010630',101) ;
GO
SCHEMABINDING
Binds the view to the schema of the underlying table or tables. When SCHEMABINDING is specified, the base table or tables cannot be modified in a way that would affect the view definition. The view definition itself must first be modified or dropped to remove dependencies on the table that is to be modified. When you use SCHEMABINDING, the select_statement must include the two-part names - (schema.object) of tables, views, or user-defined functions that are referenced. All referenced objects must be in the same database.
If a view is not created with the SCHEMABINDING clause, sp_refreshview should be run when changes are made to the objects underlying the view that affect the definition of the view. Otherwise, the view might produce unexpected results when it is queried.
CREATE VIEW vwSample
With SCHEMABINDING
As
SELECT
CustomerID,
CompanyName,
ContactName
FROM DBO.CUSTOMERS -- Two part name [ownername.objectname]
GO
When a view is created, information about the view is stored in the following catalog views: sys.views, sys.columns, and sys.sql_expression_dependencies. The text of the CREATE VIEW statement is stored in the sys.sql_modules catalog view.
Updatable Views
You can modify the data of an underlying base table through a view, as long as the following conditions are true:
- Any modifications, including UPDATE, INSERT, and DELETE statements, must reference columns from only one base table.
- The columns being modified in the view must directly reference the underlying data in the table columns. The columns cannot be derived in any other way, such as through the following:
- An aggregate function: AVG, COUNT, SUM, MIN, MAX, GROUPING, STDEV, STDEVP, VAR, and VARP.
- A computation. The column cannot be computed from an expression that uses other columns. Columns that are formed by using the set operators UNION, UNION ALL, CROSSJOIN, EXCEPT, and INTERSECT amount to a computation and are also not updatable.
The columns being modified are not affected by GROUP BY, HAVING, or DISTINCT clauses.
- TOP is not used anywhere in the select_statement of the view together with the WITH CHECK OPTION clause.
INSTEAD OF Triggers
INSTEAD OF triggers can be created on a view to make a view updatable. The INSTEAD OF trigger is executed instead of the data modification statement on which the trigger is defined. This trigger lets the user specify the set of actions that must happen to process the data modification statement. Therefore, if an INSTEAD OF trigger exists for a view on a specific data modification statement (INSERT, UPDATE, or DELETE), the corresponding view is updatable through that statement. For more information about INSTEAD OF triggers
Partitioned Views
If the view is a partitioned view, the view is updatable, subject to certain restrictions. When it is needed, the Database Engine distinguishes local partitioned views as the views in which all participating tables and the view are on the same instance of SQL Server, and distributed partitioned views as the views in which at least one of the tables in the view resides on a different or remote server.
A partitioned view is a view defined by a UNION ALL of member tables structured in the same way, but stored separately as multiple tables in either the same instance of SQL Server or in a group of autonomous instances of SQL Server servers, called federated database servers.
--Partitioned view as defined on Server1
CREATE VIEW Customers
AS
--Select from local member table.
SELECT *
FROM CompanyData.dbo.Customers_33
UNION ALL
--Select from member table on Server2.
SELECT *
FROM Server2.CompanyData.dbo.Customers_66
UNION ALL
--Select from mmeber table on Server3.
SELECT *
FROM Server3.CompanyData.dbo.Customers_99
Delete duplicate records from a table
In that case delete statement is executed as:
DELETE FROM tableName WHERE columnName IN (value1,value2,value3)
Here all the records with value1, value2, value3 will be deleted.
What is required is that we have to retain one record and delete all other duplicate records. To retain one records, the where clause must be defined on the column that uniquely identified the record.
Lets do it by example.
Suppose we have a table called tblDocs with 3 columns
CREATE TABLE tblDocs(
id INT IDENTITY(1, 1)
,docId INT
,orgId INT
)
Here we insert some values in tblDocs
INSERT INTO tblDocs VALUES (1, 1)
INSERT INTO tblDocs VALUES (1, 2)
INSERT INTO tblDocs VALUES (1, 3)
INSERT INTO tblDocs VALUES (2, 1)
INSERT INTO tblDocs VALUES (3, 2)
INSERT INTO tblDocs VALUES (2, 2)
INSERT INTO tblDocs VALUES (1, 1) --Duplicate record
INSERT INTO tblDocs VALUES (1, 3) --Duplicate record
INSERT INTO tblDocs VALUES (4, 3)
Now we have result area of tblDocs as below
Now here if we see then record with id 7 & 8 are duplicate with id 1 & 3. Now suppose we have a requirement where we need to put the unique index then we need to delete duplicate record. Here we can either keep id 1 & 3 or id 7 & 8, depends upon MIN or MAX in the below query. We can achieve the requirement with the help of below query:
DELETE FROM tblDocs
WHERE id NOT IN(
SELECT MIN(id) FROM tblDocs GROUP BY docId, orgId
)
There might be other way to fulfill this requirement. This one I used during my project. Comments and other solutions are always welcome.
SQL Server Table Hints
The syntax for a table hint is:
SELECT column_list FROM table_name WITH (INDEX (index_name) [, ...])
Here's an example:
SELECT memberID, fname, lname FROM members WITH (INDEX(memberID_index))
While you can use an index ID instead of the index name, this is not a good idea. This is because index IDs can change if you drop and recreate the indexes differently than when you first created them.
In most cases, the SQL Server Query Optimizer will correctly evaluate a query and run it as optimally as possible. But on occasion the Query Optimizer will fail, producing a less than optimal execution plan, and query performance will suffer because of it. When you identify such a query, you can override the Query Optimizer using what is called an optimizer hint. Optimizer hints can be divided into five different categories:
• Table Hints: Used to force index selection.
• Join Hints: Used to specify the type of JOIN strategy used.
• Query Hints: Hints used to affect GROUP BY and UNION functionality.
• Lock Hints : Used to help avoid undesirable locking.
• View Hints: Used to specify indexes in indexed views.
• Other Hints: Misc. hints.
In essence, what a hint does is override what the Query Optimizer recommends. If the Query Optimizer is wrong, then using a hint is very beneficial. But if the hint you specify is wrong, or not ideal, it will continue to be wrong until you change it, hurting performance until it is corrected.
Sql Server Index Basics
Indexes are created on columns in tables or views. For example, if you create an index on the primary key and then search for a row of data based on one of the primary key values, SQL Server first finds that value in the index, and then uses the index to quickly locate the entire row of data. Without the index, a table scan would have to be performed in order to locate the row, which can have a significant effect on performance.
You can create indexes on most columns in a table or a view. The exceptions are primarily those columns configured with large object (LOB) data types, such as image, text, and varchar(max). You can also create indexes on XML columns.
An index is made up of a set of pages (index nodes) that are organized in a B-tree structure. This structure is hierarchical in nature, with the root node at the top of the hierarchy and the leaf nodes at the bottom.
When a query is issued against an indexed column, the query engine starts at the root node and navigates down through the intermediate nodes, with each layer of the intermediate level more granular than the one above. The query engine continues down through the index nodes until it reaches the leaf node.
Clustered Indexes
A clustered index stores the actual data rows at the leaf level of the index. An important characteristic of the clustered index is that the indexed values are sorted in either ascending or descending order. As a result, there can be only one clustered index on a table or view. In addition, data in a table is sorted only if a clustered index has been defined on a table.
Note: A table that has a clustered index is referred to as a clustered table. A table that has no clustered index is referred to as a heap.
Nonclustered Indexes
Unlike a clustered indexed, the leaf nodes of a nonclustered index contain only the values from the indexed columns and row locators that point to the actual data rows, rather than contain the data rows themselves. This means that the query engine must take an additional step in order to locate the actual data.
A row locator’s structure depends on whether it points to a clustered table or to a heap. If referencing a clustered table, the row locator points to the clustered index, using the value from the clustered index to navigate to the correct data row. If referencing a heap, the row locator points to the actual data row.
Nonclustered indexes cannot be sorted like clustered indexes; however, you can create more than one nonclustered index per table or view. SQL Server 2005 supports up to 249 nonclustered indexes, and SQL Server 2008 support up to 999. This certainly doesn’t mean you should create that many indexes. Indexes can both help and hinder performance, as I explain later in the article.
Index Types
Simple Index:-
An index created on a single column of a table it is called simple index. The syntax for creating simple index that allow duplicate values is :-
CREATE INDEX ix_name ON tableName (columnName)
Composite Index:-
An index that contains more than one column is known as composite index. In both SQL Server 2005 and 2008, you can include up to 16 columns in an index, as long as the index doesn’t exceed the 900-byte limit. Both clustered and nonclustered indexes can be composite indexes. The syntax creating a composite index that allows duplicate value is:-
CREATE INDEX ix_name ON tableName (column1, column2)
Unique Index:-
An index that ensures the uniqueness of each value in the indexed column is known as unique index. If the index is a composite, the uniqueness is enforced across the columns as a whole, not on the individual columns. For example, if you were to create an index on the FirstName and LastName columns in a table, the names together must be unique, but the individual names can be duplicated.
If an index is created on a single column it is called Simple Unique index. The syntax for creating simple unique index is:-
CREATE UNIQUE INDEX ux_name ON tableName(columnName)
If an index is created on more than one it column is called composite unique index. The syntax for creating a composite unique index is:-
CREATE UNIQUE INDEX ux_name ON tableName (column1, column2,..)
A unique index is automatically created when you define a primary key or unique constraint:
Primary key: When you define a primary key constraint on one or more columns, SQL Server automatically creates a unique, clustered index if a clustered index does not already exist on the table or view. However, you can override the default behavior and define a unique, nonclustered index on the primary key.
Unique: When you define a unique constraint, SQL Server automatically creates a unique, nonclustered index. You can specify that a unique clustered index be created if a clustered index does not already exist on the table.
Dropping index
Indexes associated with table can be removed by using the drop index command.
DROP INDEX tableName.ix_name
Index Design
As beneficial as indexes can be, they must be designed carefully. Because they can take up significant disk space, you don’t want to implement more indexes than necessary. In addition, indexes are automatically updated when the data rows themselves are updated, which can lead to additional overhead and can affect performance. As a result, index design should take into account a number of considerations.
• Every index increases the time in takes to perform INSERTS, UPDATES and DELETES, so the number of indexes should not be very much. Try to use maximum 4-5 indexes on one table, not more. If you have read-only table, then the number of indexes may be increased.
• Keep your indexes as narrow as possible. This reduces the size of the index and reduces the number of reads required to read the index.
• Try to create indexes on columns that have integer values rather than character values.
• If you create a composite (multi-column) index, the order of the columns in the key are very important. Try to order the columns in the key as to enhance selectivity, with the most selective columns to the leftmost of the key.
• If you want to join several tables, try to create surrogate integer keys for this purpose and create indexes on their columns.
• Create surrogate integer primary key (identity for example) if your table will not have many insert operations.
• Clustered indexes are more preferable than nonclustered, if you need to select by a range of values or you need to sort results set with GROUP BY or ORDER BY.
• If your application will be performing the same query over and over on the same table, consider creating a covering index on the table.
• You can use the SQL Server Profiler Create Trace Wizard with "Identify Scans of Large Tables" trace to determine which tables in your database may need indexes. This trace will show which tables are being scanned by queries instead of using an index.
• You can use the sp_MSforeachtable undocumented stored procedure to rebuild all indexes in your database. Try to schedule it to execute during CPU idle time and slow production periods.
sp_MSforeachtable @command1="print '?' DBCC DBREINDEX ('?')"