martes, marzo 24, 2009

Normalización

Una base de datos tiene que ser diseñada antes de que pueda ser creada y usada.
El diseño debe ajustarse a estándares que permitan ahorro de memoria, acceso rápido, fácil mantenimiento, portabilidad, facilidad de futuros mejoramientos, buen desempeño y eficiencia de costos, entre otros.
El diseño lógico final de una base de datos debe ser tal que equilibre un desempeño óptimo junto con la integridad de la información. Esto puede ser logrado a través de un proceso conocido como Normalización. La base de datos debe estar en un estado de "Forma completamente normalizada".

DEFINICIÓN DE NORMALIZACION

Normalización es una serie de reglas que involucra análisis y transformación de las estructuras de los datos en relaciones que exhiban propiedades únicas de consistencia, mínima redundancia y máxima estabilidad.

La necesidad para normalizar puede ser mejor comprendida al mencionar las distintas anomalías o desventajas de los datos NO NORMALIZADOS. Consideremos la tabla en la figura 3. La tabla contiene todos los detalles de los empleados de una compañía, y los detalles del Departamento al que pertenecen.

A primera vista, parece conveniente almacenar todos los detalles en una sola tabla. Pero ciertas anomalías se pueden manifestar durante la inserción, actualización y borrado de datos. La normalización provee un método de remover todas estas indeseables anomalías haciendo la base de datos mas confiable y estable.

Anomalía de inserción (INSERT)

Suponga que un nuevo Departamento ha sido creado, el cual no tiene empleados todavía, por lo tanto, en nuestra tabla original, los datos correspondientes al empelado estarían vacíos (nulos), y solo tendríamos la información del Departamento: Columnas "numDept" y "descDept".

Anomalía de Actualización (UPDATE)

Suponga que el número del Departamento de "Sistemas" ha sido cambiado a AB108. Esto involucra tener q1ue cambiar el numero del departamento para todos los empleados que pertenezcan al departamento de "Sistemas", lo cual representa tiempo y recursos de sistema adicionales.

Anomalía de borrado (DELETE)

Si todos los empleados en el Departamento de "Finanzas" abandonan la compañía, todos los registros de estos tendrían que ser borrados. Hecho así, los detalles del Departamento "Finanzas" se perderían. Los datos en la tabla entonces no representan una información correcta sobre el estado de la compañía, y por lo tanto se pierde la integridad de los datos.

PROPIEDADES DE UNA BASE DE DATOS DESPUÉS DE LA NORMALIZACION

Una base de datos normalizada debe representar las siguientes propiedades:

Los requerimientos para almacenamiento de datos se minimizan, dado que el proceso de normalización sistemáticamente elimina la duplicación de los datos.
Desde que los datos son almacenados en el mínimo número de lugares, las posibilidades de inconsistencias en la información son reducidas al mínimo.
Las estructuras normalizadas son óptimas para efectuar actualizaciones de los datos. Dado que los datos existen en el mínimo número de lugares, una operación de actualización (UPDATE) necesitará acceder a una mínima cantidad de datos.

PROCEDIMIENTOS DE NORMALIZACION

El proceso de normalización involucra básicamente tres pasos. Después de cada paso, la base de datos se convierte en formas llamadas "formas normales". Generalmente, la "tercera forma normal" es el estado que debe alcanzar una base de datos para que se diga que está totalmente normalizada. La cuarta y la quinta forma normal también existen, pero no son usadas en el diseño de una base de datos.

A continuación, consideremos un pequeño ejercicio acerca de un Documento de Orden de Compra, el cual trataremos de convertirlo a una forma normalizada. Pero antes explicaremos unas pequeñas reglas:

Propiedades de una relación

Un tabla debe satisfacer ciertos criterios previos antes de calificar para convertirse en una relación.

Clave Única

Cada registro tiene que tener una llave única que lo identifique. Cualquier atributo puede ser una llave, pero en lo posible trataremos de elegir como llave única al atributo que tenga una longitud menor y fija, como por ejemplo un numero de ID. Si un atributo es insuficiente para identificar un registro de manera única, entonces mas de un atributo puede conformar la llave única. En tal caso, el número de atributos que conformen una llave debe ser el mínimo necesario y suficiente.

No duplicados

No debe haber nunca dos columnas o filas totalmente idénticas. Si dos filas son totalmente idénticas, entonces hacen falta algunos atributos que las haga diferentes y distinguibles. Ejemplo: Dos registros de discos compactos en una tienda serían idénticos si son dos copias del último álbum de Shakira, si no fuera porque cada disco compacto tiene un numero código que los hace diferentes.

Insignificancia del orden

La secuencia en la cual los atributos son escritos no debe importar. Podemos escribir el ID del empleado de primero, o el nombre y el apellido de primero, y esto no afectará las relaciones que establezcamos con otras tablas. Por otro lado, los registros deben ser totalmente independiente de su secuencia o posición en la base de datos (dependencia posicional). Esto significa que si intentamos identificar un registro por su posición dentro de la tabla, estaremos creando una llave inválida.

Forma no-normalizada

Los datos, en su forma elemental, no están normalizados. Por lo tanto, lo primero con lo que debemos comenzar es con los datos elementales o básicos que conformarán el diccionario de datos. El diccionario de datos es creado a partir de los documentos o diagramas de flujo de la compañía. Se deben listar los elementos uno debajo del otro. Así, obtendremos la forma no-normalizada para el ejercicio de ARD (Análisis Relacional de Datos), con el cual deberemos obtener al final distintos grupos de elementos. Mas tarde, dichos grupos se combinarán con los grupos de otros documentos al cual tambien se les ha hecho el análisis ARD, y se establecerán relaciones entre ellos.

fuente: SENA - CEV Comunidad Educativa Virtual

-

PROPIEDADES DE UN RDMS - REGLAS DE CODD

Un sistema de bases de datos (DBMS) puede ser considerado como relacional si sigue las tres reglas de oro, las cuales se enuncian a continuación:

Toda la información debe estar representada en tablas.
La recuperación de los datos debe ser posible usando sentencia de SELECT, JOIN y PROJECT.
Todas las relaciones entre los datos deben ser explícitamente representadas en los mismos datos.

Para definir los requerimientos de una base de datos relacional RDBMS mas rigurosamente, Codd ha formulado 12 reglas comúnmente conocidas como las Reglas de Codd . De un producto se puede decir que es real y completamente relacional si sigue todas la reglas, pero no existe ninguno que efectivamente si las cumpla. Por eso es que se ha generalizado el uso de la regla No. 0 que reza: "Cualquier base de datos relacional verdadera debe ser administrable enteramente a través de sus propias capacidades relacionales".

OBSERVACIÓN PARA LOS ESTUDIANTES: LOS ENUNCIADOS PRESENTADOS A CONTINUACIÓN CORRESPONDEN A LAS REGLAS TEORICAS QUE ESTABLECIÓ CODD PARA DEFINIR UN SISTEMA DE ADMINISTRACIÓN DE BASES DE DATOS RELACIONALES.
SU TERMINOLOGÍA ES MUY TÉCNICA Y PUEDE NO SER ENTENDIDA POR COMPLETO A NO SER QUE SE TENGA UN MANEJO AVANZADO DE LAS APLICACIONES BASADAS EN SQL. SE PRESENTAN A MANERA DE ILUSTRACIÓN TEORICA. MAS ADELANTE, SE OFRECERA UN RESUMEN EN TERMINOS DE FÁCIL COMPRESIÓN.

Regla No. 1 - La Regla de la información

"Toda la información en un RDBMS esta explícitamente representada de una sola manera, por valores en una tabla". Cualquier cosa que no exista en una tabla no existe del todo.

Toda la información, incluyendo nombres de tablas, nombres de vistas, nombres de columnas, y los datos de las columnas deben estar almacenados en tablas dentro de las bases de datos. Las tablas que contienen tal información contituyen el Diccionario de Datos.

Regla No. 2 - La regla del acceso garantizado

"Cada ítem de datos debe ser lógicamente accesible al ejecutar una búsqueda que combine el nombre de la tabla, su llave primaria, y el nombre de la columna.

Esto significa que dado un nombre de tabla, dado el valor de la llave primaria, y dado el nombre de la columna requerida, deberá encontrarse uno y solamente un valor. Por esta razón la definición de llaves primarias para todas las tablas es prácticamente obligatoria.

Regla No. 3 - Tratamiento sistemático de los valores nulos

"La información inaplicable o faltante puede ser representada a través de valores nulos". Un RDBMS debe ser capaz de soportar el uso de valores nulos en el lugar de columnas cuyos valores sean desconocidos o inaplicables.

Regla No. 4 - La regla de la descripción de la base de datos

"La descripción de la base de datos es almacenada de la misma manera que los datos ordinarios, esto es, en tablas y columnas, y debe ser accesible a los usuarios autorizados".

La información de tablas, vistas, permisos de acceso de usuarios autorizados, etc, debe ser almacenada exactamente de la misma manera: En tablas. Estas tablas deben ser accesibles igual que todas las tablas, a través de sentencias de SQL.

Regla No. 5 - La regla del sub-lenguaje Integral.

"Debe haber al menos un lenguaje que sea integral para soportar la definición de datos, manipulación de datos, definición de vistas, restricciones de integridad, y control de autorizaciones y transacciones".
Esto significa que debe haber por lo menos un lenguaje con una sintaxis bien definida que pueda ser usado para administrar completamente la base de datos.

Regla No. 6 - La regla de la actualización de vistas

"Todas las vistas que son teóricamente actualizables, deben ser actualizables por el sistema mismo". La mayoría de las RDBMS permiten actualizar vistas simples, pero deshabilitan los intentos de actualizar vistas complejas.

Regla No. 7 - La regla de insertar y actualizar

"La capacidad de manejar una base de datos con operandos simples aplica no solo para la recuperación o consulta de datos, sino también para la inserción, actualización y borrado de datos".

Esto significa que las cláusulas SELECT, UPDATE, DELETE e INSERT deben estar disponibles y operables sobre los registros independientemente del tipo de relaciones y restricciones que haya entre las tablas.

Regla No. 8 - La regla de independencia física

"El acceso de usuarios a la base de datos, a través de terminales o programas de aplicación, debe permanecer consistente lógicamente cuando quiera que haya cambios en los datos almacenados, o sean cambiados los métodos de acceso a los datos".

El comportamiento de los programas de aplicación y de la actividad de usuarios vía terminales debería ser predecible basados en la definición lógica de la base de datos, y este comportamiento debería permanecer inalterado, independientemente de los cambios en la definición física de ésta.

Regla No. 9 - La regla de independencia lógica

"Los programas de aplicación y las actividades de acceso por terminal deben permanecer lógicamente inalteradas cuando quiera que se hagan cambios (según los permisos asignados) en las tablas de la base de datos".
La independencia lógica de los datos especifica que los programas de aplicación y las actividades de terminal deben ser independientes de la estructura lógica, por lo tanto los cambios en la estructura lógica no deben alterar o modificar estos programas de aplicación.

Regla No. 10 - La regla de la independencia de la integridad

"Todas las restricciones de integridad deben ser definibles in los datos, y almacenables en el catalogo, no n el programa de aplicación". Las reglas de integridad son:

1. Ningún componente de una llave primaria puede tener valores en blanco o nulos. (esta es la norma básica d integridad).

2. Para cada valor de llave foránea deberá existir una valor de llave primaria concordante. La combinación de estas reglas aseguran que haya Integridad referencial.

Regla No. 11 - La regla de la distribución

"El sistema debe poseer un lenguaje de datos que pueda soportar que la base de datos esté distribuida físicamente en distintos lugares sin que esto afecte o altere a los programas de aplicación".

El soporte para bases de datos distribuidas significa que una colección arbitraria de relaciones, bases de datos corriendo en una mezcla de distintas máquinas y distintos sistemas operativos y que este conectada por una variedad de redes, pueda funcionar como si estuviera disponible como una única base de datos en una sola máquina.

Regla No. 12 - Regla de la no-subversión

"Si el sistema tiene lenguajes de bajo nivel, estos lenguajes de ninguna manera pueden ser usados para violar la integridad de las reglas y restricciones expresadas en un lenguaje de alto nivel(como SQL)". Algunos productos solamente construyen una interfaz relacional para sus bases de datos No relacionales, lo que hace posible la subversión (violación) de las restricciones de integridad. Esto no debe ser permitido.

RESUMEN

TAL COMO LO ANUNCIAMOS, A CONTINUACIÓN PRESENTAREMOS UN RESUMEN EN TERMINOLOGÍA DE FÁCIL COMPRESNSION, QUE ILUSTRA LOS PRINCIPALES CONCEPTOS DE LAS BASES DE DATOS Y LAS REGLAS DE CODD LLEVADAS A LA PRACTICA.

Un RDBMS debe proporcionar a los usuarios la capacidad de almacenar datos en la base de datos, acceder a ellos y actualizarlos. Esta es la función fundamental de un RDBMS y por supuesto, el RDBMS debe ocultar al usuario la estructura física interna (la organización de los archivos y las estructuras de almacenamiento).

Un RDBMS debe proporcionar un catálogo en el que se almacenen las descripciones de los datos y que sea accesible por los usuarios. Este catálogo es lo que se denomina diccionario de datos y contiene información que describe los datos de la base de datos (metadatos). Normalmente, un diccionario de datos almacena: Nombre, tipo y tamaño de los datos, Nombre de las relaciones entre los datos.

Un RDBMS debe proporcionar un mecanismo que garantice que todas las actualizaciones correspondientes a una determinada transacción se realicen, o que no se realice ninguna. Una transacción en el sistema informático de los empleados de una empresa (por ejemplo) sería dar de alta a un empleado o eliminar un cargo. Una transacción un poco más complicada sería eliminar un Departamento o división y reasignar todos sus empleados a otro Departamento. En este caso hay que realizar varios cambios sobre la base de datos. Si la transacción falla durante su realización, por ejemplo porque falla el hardware (o se va la energía eléctrica) , la base de datos quedará en un estado inconsistente. Algunos de los cambios se habrán hecho y otros no, por lo tanto, los cambios realizados deberán ser deshechos para devolver la base de datos a un estado consistente.

Un RDBMS debe proporcionar un mecanismo capaz de recuperar la base de datos en caso de que ocurra algún suceso que la dañe. Como se ha comentado antes, cuando el sistema falla en medio de una transacción, la base de datos se debe devolver a un estado consistente. Este fallo puede ser a causa de un fallo en algún dispositivo hardware o un error del software, que hagan que el RDBMS aborte, o puede ser a causa de que el usuario detecte un error durante la transacción y la aborte antes de que finalice. En todos estos casos, el RDBMS debe proporcionar un mecanismo capaz de recuperar la base de datos llevándola a un estado consistente.

Un RDBMS debe proporcionar un mecanismo que garantice que sólo los usuarios autorizados pueden acceder a la base de datos. La protección debe ser contra accesos no autorizados, tanto intencionados como accidentales.

Un RDBMS debe ser capaz de integrarse con algún software de comunicación. Muchos usuarios acceden a la base de datos desde terminales. En ocasiones estos terminales se encuentran conectados directamente a la máquina sobre la que funciona el RDBMS. En otras ocasiones los terminales están en lugares remotos, por lo que la comunicación con la máquina que alberga al RDBMS se debe hacer a través de una red. En cualquiera de los dos casos, el RDBMS recibe peticiones en forma de mensajes y responde de modo similar. Todas estas transmisiones de mensajes las maneja el gestor de comunicaciones de datos. Aunque este gestor no forma parte del RDBMS, es necesario que el RDBMS se pueda integrar con él para que el sistema sea comercialmente viable.

Un RDBMS debe proporcionar los medios necesarios para garantizar que tanto los datos de la base de datos, como los cambios que se realizan sobre estos datos, sigan ciertas reglas. La integridad de la base de datos requiere la validez y consistencia de los datos almacenados. Se puede considerar como otro modo de proteger la base de datos, pero además de tener que ver con la seguridad, tiene otras implicaciones. La integridad se ocupa de la calidad de los datos. Normalmente se expresa mediante restricciones, que son una serie de reglas que la base de datos no puede violar. Por ejemplo, se puede establecer la restricción de que el número de cédula de un empleado no puede tener caracteres alfanuméricos, o por ejemplo que para dar de alta un empleado éste debe pertenecer obligatoriamente a un Departamento. En este caso sería deseable que el RDBMS controlara que no se violen esas reglas cada vez que se ingresen los datos de un empleado a la base de datos de la empresa.

fuente: SENA - CEV Comunidad Educativa Virtual -

.

Los Sistemas de Bases de Datos Relacionales (RDBMS)

Los sistemas de bases de datos relacionales RDBMS (Relational Database Management System, por sus siglas en Inglés) tales como Oracle, MySQL, SQL Server, PostgreSQL, Informix, entre otros, le permiten ejecutar las tareas que se mencionan a continuación, de una forma entendible y razonablemente sencilla:

Le permiten ingresar datos al sistema.
Le permiten almacenar los datos.
Le permiten recuperar los datos y trabajar con ellos.
Le proveen herramientas para capturar, editar y manipular datos.
Le permiten aplicar seguridad.
Le permiten crear reportes e informes con los datos.

DEFINICIÓN Y TERMINOLOGÍA DE UN RDBMS

Los sistemas de base de datos relacionales son aquellos que almacenan y administran de manera lógica los datos en forma de tablas. Una TABLA es, a su vez, un método para presentar los datos en la forma de filas y columnas.

Cada columna representa un campo único de un registro. Varias de estas columnas o campo componen un registro, proveyendo información significativa e interrelacionada. Cada registro es representado en una fila. Una tabla puede consistir en varias columnas. Muchas de las tablas que poseen datos interrelacionados e interdependientes son agrupadas por medio de el establecimiento de relaciones entre ellas. Al administrar las tablas y sus relaciones, encontramos los medios para insertar, borrar, consultar y actualizar la información de un sistema RDBMS.

En la tabla anterior, la tabla Empleados consiste en tres columnas y tres filas.
Las columnas o campo conforman un registro lógico, correspondiente a un empleado.
La tabla Empleados esta relacionada con la tabla de Departamentos por medio de una columna "Numero de Departamento" que aparece en ambas tablas.

Llave Primaria

Hemos visto que los datos son almacenados de manera lógica en tablas en la Bases de datos relacionales. Cada tabla tiene un nombre único. Para identificar una fila particular en una tabla, se usa una columna o combinación de columnas. Esta columna debe ser tal que identifique de manera única e inequívoca cada fila.

No puede haber mas de dos filas (registros) en una tabla que tengan el mismo valor para la columna que haya sido elegida como llave primaria. Una columna identificada como la llave primaria no puede tener valores duplicados no nulos.

Por ejemplo, considerando la tabla de Empleados presentada en la Figura No. 1, podemos ver que cada empleado tiene un único numero de empleado. La columna "NUM-EMP" puede ser escogida como la llave primaria. Similarmente, la columna "NUM-DEPT" en la tabla de Departamentos puede ser igualmente una llave primaria.

Llave Foránea

La llave primaria y la llave foránea son usadas para establecer relaciones entre tablas. En la Figura No. 1 el dominio de los valores de la columna "NUM-DEPT" de la tabla Empleados se encuentra dentro del rango de valores de la columna "NUM-DEPT" de la tabla Departamentos. Un empleado deber pertenecer a un Departamento que esté listado en la tabla Departamentos.

Se considera entonces que la columna "NUM-DEPT" en la tabla Empleados es una llave foránea. De esta manera, la existencia de esta llave foránea en la tabla Empleados controla que no pueda ser ingresado un nuevo registro de un empleado si este no pertenece primero a un Departamento.

Si el empleado que desea ingresarse a la tabla trabaja en un Departamento que no esta listado en la tabla Departamentos, primero debe crearse el registro del Departamento en su respectiva tabla, y luego si procedemos a ingresar al empleado. Este tipo de control que impone la asignación de una llave foránea en una tabla es de mucha utilidad para evitar la existencia de registros huérfanos y para evitar la incongruencia de datos, temas que veremos mas adelante. Además, como dijimos al principio, la llave foránea nos permite relacionar dos tablas, lo cual nos permite compartir y repartir la información de manera que no tengamos los mismos datos duplicados en varias tablas. Estos conceptos serán aterrizados en la sección de Normalización de tablas que se estudiará en un capitulo posterior.

En la figura No. 2 hemos establecido la siguiente convención:
En los esquemas de tablas, las llaves primarias están subrayadas.
Igualmente diagramaremos restricciones de integridad referencial a través de líneas de conexión que van desde cada llave foránea hasta la llave primaria que referencie. Para que haya mejor claridad, la punta de la flecha deberá apuntar hacia la llave primaria de la tabla referenciada.

Nulos

Un Nulo se puede interpretar como un valor indefinido o como ningún valor. Los nulos son usados en las columnas donde se desconozca su valor. Un nulo no significan espacios en blanco. Un valor "nulo" no puede ser usado para hacer ningún cálculo u operaciones de comparación. Un "nulo" puede ser comparable a un infinito. Un "nulo" no es igual a otro "nulo".

Vistas

Los RDBMS gestionan la estructura física de los datos y su almacenamiento. Con esta funcionalidad, el RDBMS se convierte en una herramienta de gran utilidad. Sin embargo, desde el punto de vista del usuario, se podría discutir que los RDBMS han hecho las cosas más complicadas, ya que ahora los usuarios ven más datos de los que realmente quieren o necesitan, puesto que ven la base de datos completa. Conscientes de este problema, los RDBMS proporcionan un mecanismo de vistas que permite que cada usuario tenga su propia vista o visión de la base de datos. El lenguaje de definición de datos permite definir vistas como subconjuntos de la base de datos. Las vistas, además de reducir la complejidad permitiendo que cada usuario vea sólo la parte de la base de datos que necesita, tienen otras ventajas:

Las vistas proporcionan un nivel de seguridad, ya que permiten excluir datos para que ciertos usuarios no los vean. Las vistas proporcionan un mecanismo para que los usuarios vean los datos en el formato que deseen.
Una vista representa una imagen consistente y permanente de la base de datos, incluso si la base de datos cambia su estructura.

Fuente: SENA - CEV Comunidad Educativa Virtual

-

Sistemas de Administración de Bases de Datos

Un sistema de administración de bases de datos DBMS (Database Management System, por sus siglas en Inglés) es un sistema basado en computador (software) que maneja una base de datos, o una colección de bases de datos o archivos. La persona que administra un D0BMS es conocida como el DBA (Database Administrator, por sus siglas en ingles).

USOS Y FUNCIONES DE UN DBMS

Los sistemas de administración de bases de datos son usados para:

• Permitir a los usuarios acceder y manipular la base de datos proveyendo métodos para construir sistemas de procesamiento de datos para aplicaciones que requieran acceso a los datos.

• Proveer a los administradores las herramientas que les permitan ejecutar tareas de mantenimiento y administración de los datos.

Algunas de las funciones de un DBMS son:

• Definición de la base de datos - como la información va a ser almacenada y organizada.

• Creación de la base de datos - almacenamiento de datos en una base de datos definida.

• Recuperación de los datos - consultas y reportes.

• Actualización de los datos - cambiar los contenidos de la base de datos.

• Programación de aplicaciones de para el desarrollo de software.

• Control de la integridad de la base de datos.

• Monitoreo del comportamiento de la base de datos.

CARACTERISTICAS DE UN DBMS

Control de la redundancia de datos
Este consiste en lograr una mínima cantidad de espacio de almacenamiento para almacenar los datos evitando la duplicación de la información. De esta manera se logran ahorros en el tiempo de procesamiento de la información, se tendrán menos inconsistencias, menores costos operativos y hará el mantenimiento más fácil.

Compartimiento de datos
Una de las principales características de las bases de datos, es que los datos pueden ser compartidos entre muchos usuarios simultáneamente, proveyendo, de esta manera, máxima eficiencia.

Mantenimiento de la integridad

La integridad de los datos es la que garantiza la precisión o exactitud de la información contenida en una base de datos. Los datos interrelacionados deben siempre representar información correcta a los usuarios.

Soporte para control de transacciones y recuperación de fallas.

Se conoce como transacción toda operación que se haga sobre la base de datos. Las transacciones deben por lo tanto ser controladas de manera que no alteren la integridad de la base de datos. La recuperación de fallas tiene que ver con la capacidad de un sistema DBMS de recuperar la información que se haya perdido durante una falla en el software o en el hardware.
Independencia de los datos.

En las aplicaciones basadas en archivos, el programa de aplicación debe conocer tanto la organización de los datos como las técnicas que el permiten acceder a los datos. En los sistemas DBMS los programas de aplicación no necesitan conocer la organización de los datos en el disco duro. Este totalmente independiente de ello.

Seguridad
La disponibilidad de los datos puede ser restringida a ciertos usuarios. Según los privilegios que posea cada usuario de la base de datos, podrá acceder a mayor información que otros.

Velocidad

Los sistemas DBMS modernos poseen altas velocidades de respuesta y proceso.

Independencia del hardware
La mayoría de los sistemas DBMS están disponibles para ser instalados en múltiples plataformas de hardware.
fuente: SENA - CEV Comunidad Educativa Virtual

lunes, marzo 16, 2009

SQL-Schema Statements

SQL-Schema Statements provide maintenance of catalog objects for a schema -- tables, views and privileges. This subset of SQL is also called the Data Definition Language for SQL (SQL DDL).

There are 6 SQL-Schema Statements:

CREATE TABLE Statement -- create a new base table in the current schema
CREATE VIEW Statement -- create a new view table in the current schema
DROP TABLE Statement -- remove a base table from the current schema
DROP VIEW Statement -- remove a view table from the current schema
GRANT Statement -- grant access privileges for objects in the current schema to other users
REVOKE Statement -- revoke previously granted access privileges for objects in the current schema from other users

Schema Overview

A relational database contains a catalog that describes the various elements in the system. The catalog divides the database into sub-databases known as schemas. Within each schema are database objects -- tables, views and privileges.

The catalog itself is a set of tables with its own schema name - definition_schema. Tables in the catalog cannot be modified directly. They are modified indirectly with SQL-Schema statements.

Tables

The database table is the root structure in the relational model and in SQL. A table (called a relation in relational) consists of rows and columns. In relational, rows are called tuples and columns are called attributes. Tables are often displayed in a flat format, with columns arrayed horizontally and rows vertically:

C o l u m n s

Database tables are a logical structure with no implied physical characteristics. Primary among the various logical tables is the base table. A base table is persistent and self contained, that is, all data is part of the table itself with no information dynamically derived from other tables.

A table has a fixed set of columns. The columns in a base table are not accessed positionally but by name, which must be unique among the columns of the table. Each column has a defined data type, and the value for the column in each row must be from the defined data type or null. The columns of a table are accessed and identified by name.

A table has 0 or more rows. A row in a base table has a value or null for each column in the table. The rows in a table have no defined ordering and are not accessed positionally. A table row is accessed and identified by the values in its columns.

In SQL92, base tables can have duplicate rows (rows where each column has the same value or null). However, the relational model does not recognize tables with duplicate rows as valid base tables (relations). The relational model requires that each base table have a unique identifier, known as the Primary Key. The primary key for a table is a designated set of columns which have a unique value for each table row. For a discussion of Primary Keys, see Entity Integrity under CREATE TABLE below.

A base table is defined using the CREATE TABLE Statement. This statement places the table description in the catalog and initializes an internal entity for the actual representation of the base table.

Example base table - s:

sno	name	city
S1	Pierre	Paris
S2	John	London
S3	Mario	Rome

The s table records suppliers. It has 3 defined columns:

sno -- supplier number, an unique identifier that is the primary key
name -- the name of the supplier
city -- the city where the supplier is located

At the current time, there are 3 rows.

Other types of tables in the system are derived tables. SQL-Data statements use internally derived tables in computing results. A query is in fact a derived table. For instance, the query operator - Union, combines two derived tables to produce a third one. Much of the power of SQL comes from the fact that its higher level operations are performed on tables and produce a table as their result.

Derived tables are less constrained than base tables. Column names are not required and need not be unique. Derived tables may have duplicate rows. Views are a type of derived table that are cataloged in the database. See Views below.

Views

A view is a derived table registered in the catalog. A view is defined using a SQL query. The view is dynamically derived, that is, its contents are materialized for each use. Views are added to the catalog with the CREATE VIEW Statement.

Once defined in the catalog, a view can substitute for a table in SQL-Data statements. A view name can be used instead of a base table name in the FROM clause of a SELECT statement. Views can also be the subject of a modification statement with some restrictions.

A SQL Modification Statement can operate on a view if it is an updatable view. An updatable view has the following restrictions on its defining query:

The query FROM clause can reference a single table (or view)
The single table in the FROM clause must be:
- a base table,
- a view that is also an updatable view, or
- a nested query that is updatable, that is, it follows the rules for an updatable view query.
The query must be a basic query, not a:
- Grouping Query,
- Aggregate Query, or
- Union Query.
The select list cannot contain:
- the DISTINCT specifier,
- an Expression, or
- duplicate column references

Subqueries are acceptable in updatable views but cannot reference the underlying base table for the view's FROM clause.

Privileges

SQL92 defines a SQL-agent as an implementation-dependent entity that causes the execution of SQL statements. Prior to execution of SQL statements, the SQL-agent must establish an authorization identifier for database access. An authorization identifier is commonly called a user name.

A DBMS user may access database objects (tables, columns, views) as allowed by the privileges assigned to that specific authorization identifier. Access privileges may be granted by the system (automatic) or by other users.

System granted privileges include:

All privileges on a table to the user that created the table. This includes the privilege to grant privileges on the table to other users.
SELECT (readonly) privilege on the catalog (the tables in the schema - definition_schema). This is granted to all users.

User granted privileges cover privileges to access and modify tables and their columns. Privileges can be granted for specific SQL-Data Statements -- SELECT, INSERT, UPDATE, DELETE.

CREATE TABLE Statement

The CREATE TABLE Statement creates a new base table. It adds the table description to the catalog. A base table is a logical entity with persistence. The logical description of a base table consists of:

Schema -- the logical database schema the table resides in
Table Name -- a name unique among tables and views in the Schema
Column List -- an ordered list of column declarations (name, data type)
Constraints -- a list of constraints on the contents of the table

The CREATE TABLE Statement has the following general format:

CREATE TABLE table-name ({column-descr|constraint} [,{column-descr|constraint}]...)

table-name is the new name for the table. column-descr is a column declaration. constraint is a table constraint.

The column declaration can include optional column constraints. The declaration has the following general format:

column-name data-type [column-constraints]

column-name is the name of the column and must be unique among the columns of the table. data-type declares the type of the column. Data types are described below. column-constraints is an optional list of column constraints with no separators.

Constraints

Constraint specifications add additional restrictions on the contents of the table. They are automatically enforced by the DBMS. The column constraints are:

NOT NULL -- specifies that the column can't be set to null. If this constraint is not specified, the column is nullable, that is, it can be set to null. Normally, primary key columns are declared as NOT NULL.
PRIMARY KEY -- specifies that this column is the only column in the primary key. There can be only one primary key declaration in a CREATE TABLE. For primary keys with multiple columns, use the PRIMARY KEY table constraint. See Entity Integrity below for a detailed description of primary keys.
UNIQUE -- specifies that this column has a unique value or null for all rows of the table.
REFERENCES -- specifies that this column is the only column in a foreign key. For foreign keys with multiple columns, use the FOREIGN KEY table constraint. See Referential Integrity below for a detailed description of primary keys.
CHECK -- specifies a user defined constraint on the table. See the table constraint - CHECK, below.

The table constraints are:

PRIMARY KEY -- specifies the set of columns that comprise the primary key. There can be only one primary key declaration in a CREATE TABLE Statement. See Entity Integrity below for a detailed description of primary keys.
UNIQUE -- specifies that a set of columns have unique values (or nulls) for all rows in the table. The UNIQUE specifier is followed by a parenthesized list of column names, separated by commas.
FOREIGN KEY -- specifies the set of columns in a foreign key. See Referential Integrity below for a detailed description of foreign keys.
CHECK -- specifies a user defined constraint, known as a check condition. The CHECK specifier is followed by a predicate enclosed in parentheses. For Intermediate Level SQL92, the CHECK predicate can only reference columns from the current table row, with no subqueries. Many DBMSs support subqueries in the check predicate.
The check predicate must evaluate to not False (that is, the result must be True or Unknown) before a modification or addition of a row takes place. The check is effectively made on the contents of the table after the modification. For INSERT Statements, the predicate is evaluated as if the INSERT row were added to the table. For UPDATE Statements, the predicate is evaluated as if the row were updated. For DELETE Statements, the predicate is evaluated as if the row were deleted (Note: A check predicate is only useful for DELETE if a self-referencing subquery is used.)

Data Type

This subsection describes data type specifications. The data type categories are:

Character (String) -- fixed or variable length character strings. The character set is implementation defined but often defaults to ASCII.
Numeric -- values representing numeric quantities. Numeric values are divided into these two broad categories:
- Exact (also known as fixed-point) -- Exact numeric values have a fixed number of digits to the left of the decimal point and a fixed number of digits to the right (the scale). The total number of digits on both sides of the decimal are the precision. A special subset of exact numeric types with a scale of 0 is called integer.
- Approximate (also known as floating-point) -- Approximate numeric values that have a fixed precision (number of digits) but a floating decimal point.
All numeric types are signed.
Datetime -- Datetime values include calendar and clock values (Date, Time, Timestamp) and intervals. The datetime types are:
- Date -- calendar date with year, month and day
- Time -- clock time with hour, minute, second and fraction of second, plus a timezone component (adjustment in hours, minutes)
- Timestamp -- combination calendar date and clock time with year, month, day, hour, minute, second and fraction of second, plus a timezone component (adjustment in hours, minutes)
- Interval -- intervals represent time and date intervals. They are signed. An interval value can contain a subset of the interval fields, for example - hour to minute, year, day to second. Interval types are subdivided into:
  - year-month intervals -- may contain years, months or combination years/months value.
  - day-time intervals -- days, hours, minutes, seconds, fractions of second.

Data type declarations have the following general format:

length specifies the number of characters for fixed size strings (CHAR, CHARACTER); spaces are supplied for shorter strings. If length is missing for fixed size strings, the default length is 1. For variable size strings (VARCHAR, CHARACTER VARYING), length is the maximum size of the string. Strings exceeding length are truncated on the right.

Numeric

The integer types have default binary precision -- 15 for SMALLINT and 31 for INT, INTEGER.

NUMERIC ( precision [, scale] )
DECIMAL ( precision [, scale] )

Fixed point types have a decimal precision (total number of digits) and scale (which cannot exceed the precision). The default scale is 0. NUMERIC scales must be represented exactly. DECIMAL values can be stored internally with a larger scale (implementation defined).

FLOAT [(precision)]
REAL
DOUBLE

The floating point types have a binary precision (maximum significant binary digits). Precision values are implementation dependent for REAL and DOUBLE, although the standard states that the default precision for DOUBLE must be larger than for REAL. FLOAT also uses an implementation defined default for precision (commonly this is the same as for REAL), but the binary precision for FLOAT can be explicit.

Datetime

TIME and TIMESTAMP allow an optional seconds fraction (scale). The default scale for TIME is 0, for TIMESTAMP 6. The optional WITH TIME ZONE specifier indicates that the timezone adjustment is stored with the value; if omitted, the current system timezone is assumed.

INTERVAL interval-qualifier

See below for a description of the interval-qualifier.

Interval Qualifier

An interval qualifier defines the specific type of an interval value. The qualifier for an interval type declares the sub-fields that comprise the interval, the precision of the highest (left-most) sub-field and the scale of the SECOND sub-field (if any).

Intervals are divided into sub-types -- year-month intervals and day-time intervals. Year-month intervals can only contain the sub-fields - year and month. Day-time intervals can contain day, hour, minute, second. The interval qualifier has the following formats:

YEAR [(precision)] [ TO MONTH ]

MONTH [(precision)]

{DAY|HOUR|MINUTE} [(precision)] [ TO SECOND [(scale)] ]

DAY [(precision)] [ TO {HOUR|MINUTE} ]

HOUR [(precision)] [ TO MINUTE ]

SECOND [ (precision [, scale]) ]

The default precision is 2. The default scale is 6.

Entity Integrity

As mentioned earlier, the relational model requires that each base table have a Primary Key. SQL92, on the other hand, allows a table to created without a primary key. The advice here is to create all tables with primary keys.

A primary key is a constraint on the contents of a table. In relational terms, the primary key maintains Entity Integrity for the table. It constrains the table as follows,

For a given row, the set of values for the primary key columns must be unique from all other rows in the table,
No primary key column can contain a null, and
A table can have only one primary key (set of primary key columns).

Note: SQL92 does not require the second restriction on nulls in the primary key. However, it is required for a relational system.

Entity Integrity (Primary Keys) is enforced by the DBMS and ensures that every row has a proper unique identifier. The contents of any column in the table with Entity Integrity can be uniquely accessed with 3 pieces of information:

table identifier
primary key value
column name

This capability is crucial to a relational system. Having a clear, consistent identifier for table rows (and their columns) distinguishes relational systems from all others. It allows the establishment of relationships between tables, also crucial to relational systems. This is discussed below under Referential Integrity.

The primary key constraint in the CREATE STATEMENT has two forms. When the primary key consists of a single column, it can be declared as a column constraint, simply - PRIMARY KEY, attached to the column descriptor. For example:

sno VARCHAR(5) NOT NULL PRIMARY KEY

As a table constraint, it has the following format:

PRIMARY KEY ( column-1 [, column-2] ...)

column-1 and column-2 are the names of the columns of the primary key. For example,

PRIMARY KEY (sno, pno)

The order of columns in the primary key is not significant, except as the default order for the FOREIGN KEY table constraint, See Referential Integrity, below.

Referential Integrity

Foreign keys provide relationships between tables in the database. In relational, a foreign key in a table is a set of columns that reference the primary key of another table. For each row in the referencing table, the foreign key must match an existing primary key in the referenced table. The enforcement of this constraint is known as Referential Integrity.

Referential Integrity requires that:

The columns of a foreign key must match in number and type the columns of the primary key in the referenced table.
The values of the foreign key columns in each row of the referencing table must match the values of the corresponding primary key columns for a row in the referenced table.

The one exception to the second restriction is when the foreign key columns for a row contain nulls. Since primary keys should not contain nulls, a foreign key with nulls cannot match any row in the referenced table. However, a row with a foreign key where any foreign key column contains null is allowed in the referencing table. No corresponding primary key value in the referenced table is required when any one (or more) of the foreign key columns is null. Other columns in the foreign key may be null or non-null. Such a foreign key is a null reference, because it does not reference any row in the referenced table.

Like other constraints, the referential integrity constraint restricts the contents of the referencing table, but it also may in effect restrict the contents of the referenced table. When a row in a table is referenced (through its primary key) by a foreign key in a row in another table, operations that affect its primary key columns have side-effects and may restrict the operation. Changing the primary key of or deleting a row which has referencing foreign keys would violate the referential integrity constraints on the referencing table if allowed to proceed. This is handled in two ways,

The referenced table is restricted from making the change (and violating referential integrity in the referencing table), or
Rows in the referencing table are modified so the referential integrity constraint is maintained.

These actions are controlled by the referential integrity effects declarations, called referential triggers by SQL92. The referential integrity effect actions defined for SQL are:

NO ACTION -- the change to the referenced (primary key) table is not performed. This is the default.
CASCADE -- the change to the referenced table is propagated to the referencing (foreign key) table.
SET NULL -- the foreign key columns in the referencing table are set to null.

Update and delete have separate action declarations. For CASCADE, update and delete also operate differently:

For update (the primary key column values have been modified), the corresponding foreign key columns for referencing rows are set to the new values.
For delete (the primary key row is deleted), the referencing rows are deleted.

A referential integrity constraint in the CREATE STATEMENT has two forms. When the foreign key consists of a single column, it can be declared as a column constraint, like:

column-descr REFERENCES references-specification

As a table constraint, it has the following format:

FOREIGN KEY (column-list) REFERENCES references-specification

column-list is the referencing table columns that comprise the foreign key. Commas separate column names in the list. Their order must match the explicit or implicit column list in the references-specification.

The references-specification has the following format:

table-2 [ ( referenced-columns ) ]
       [ ON UPDATE { CASCADE | SET NULL | NO ACTION }]
       [ ON DELETE { CASCADE | SET NULL | NO ACTION }]

The order of the ON UPDATE and ON DELETE clauses may be reversed. These clauses declare the effect action when the referenced primary key is updated or deleted. The default for ON UPDATE and ON DELETE is NO ACTION.

table-2 is the referenced table name (primary key table). The optional referenced-columns list the columns of the referenced primary key. Commas separate column names in the list. The default is the primary key list in declaration order.

Contrary to the relational model, SQL92 allows foreign keys to reference any set of columns declared with the UNIQUE constraint in the referenced table (even when the table has a primary key). In this case, the referenced-columns list is required.

Example table constraint for referential integrity (for the sp table):

FOREIGN KEY (sno)
REFERENCES s(sno)
ON DELETE NO ACTION
ON UPDATE CASCADE

CREATE TABLE Examples

Creating the example tables:

CREATE TABLE s
(sno VARCHAR(5) NOT NULL PRIMARY KEY,
name VARCHAR(16),
city VARCHAR(16)
)

CREATE TABLE p
(pno VARCHAR(5) NOT NULL PRIMARY KEY,
descr VARCHAR(16),
color VARCHAR(8)
)

CREATE TABLE sp
(sno VARCHAR(5) NOT NULL REFERENCES s,
pno VARCHAR(5) NOT NULL REFERENCES p,
qty INT,
PRIMARY KEY (sno, pno)
)

Create for sp with a constraint that the qty column can't be negative:

CREATE TABLE sp
(sno VARCHAR(5) NOT NULL REFERENCES s,
pno VARCHAR(5) NOT NULL REFERENCES p,
qty INT CHECK (qty >= 0),
PRIMARY KEY (sno, pno)
)

CREATE VIEW Statement

The CREATE VIEW statement creates a new database view. A view is effectively a SQL query stored in the catalog. The CREATE VIEW has the following general format:


CREATE VIEW view-name [ ( column-list ) ] AS query-1
           [ WITH [CASCADED|LOCAL] CHECK OPTION ]

view-name is the name for the new view. column-list is an optional list of names for the columns of the view, comma separated. query-1 is any SELECT statement without an ORDER BY clause. The optional WITH CHECK OPTION clause is a constraint on updatable views.

column-list must have the same number of columns as the select list in query-1. If column-list is omitted, all items in the select list of query-1 must be named. In either case, duplicate column names are not allowed for a view.

The optional WITH CHECK OPTION clause only applies to updatable views. It affects SQL INSERT and UPDATE statements. If WITH CHECK OPTION is specified, the WHERE predicate for query-1 must evaluate to true for the added row or the changed row.

The CASCADED and LOCAL specifiers apply when the underlying table for query-1 is another view. CASCADED requests that WITH CHECK OPTION apply to all underlying views (to any level.) LOCAL requests that the current WITH CHECK OPTION apply only to this view. LOCAL is the default.

CREATE VIEW Examples

Parts with suppliers:

CREATE VIEW supplied_parts AS
           SELECT *
           FROM p
           WHERE pno IN (SELECT pno FROM sp)
      WITH CHECK OPTION

Access example:

SELECT * FROM supplied_parts

pno	descr	color
P1	Widget	Red
P2	Widget	Blue

Joined view:

CREATE VIEW part_locations (part, quantity, location) AS
           SELECT pno, qty, city
           FROM sp, s
           WHERE sp.sno = s.sno

Access examples:

SELECT * FROM part_locations

part	quantity	location
P1	NULL	Paris
P1	200	London
P1	1000	Rome
P2	200	Rome

SELECT part, quantity
FROM part_locations
WHERE location = 'Rome'

part	quantity
P1	1000
P2	200

DROP TABLE Statement

The DROP TABLE Statement removes a previously created table and its description from the catalog. It has the following general format:

DROP TABLE table-name {CASCADE|RESTRICT}

table-name is the name of an existing base table in the current schema. The CASCADE and RESTRICT specifiers define the disposition of other objects dependent on the table. A base table may have two types of dependencies:

A view whose query specification references the drop table.
Another base table that references the drop table in a constraint - a CHECK constraint or REFERENCES constraint.

RESTRICT specifies that the table not be dropped if any dependencies exist. If dependencies are found, an error is returned and the table isn't dropped.

CASCADE specifies that any dependencies are removed before the drop is performed:

Views that reference the base table are dropped, and the sequence is repeated for their dependencies.
Constraints in other tables that reference this table are dropped; the constraint is dropped but the table retained.

DROP VIEW Statement

The DROP VIEW Statement removes a previously created view and its description from the catalog. It has the following general format:

DROP VIEW view-name {CASCADE|RESTRICT}

view-name is the name of an existing view in the current schema. The CASCADE and RESTRICT specifiers define the disposition of other objects dependent on the view. A view may have two types of dependencies:

A view whose query specification references the drop view.
A base table that references the drop view in a constraint - a CHECK constraint.

RESTRICT specifies that the view not be dropped if any dependencies exist. If dependencies are found, an error is returned and the view isn't dropped.

CASCADE specifies that any dependencies are removed before the drop is performed:

Views that reference the drop view are dropped, and the sequence is repeated for their dependencies.
Constraints in base tables that reference this view are dropped; the constraint is dropped but the table retained.

GRANT Statement

The GRANT Statement grants access privileges for database objects to other users. It has the following general format:

GRANT privilege-list ON [TABLE] object-list TO user-list

privilege-list is either ALL PRIVILEGES or a comma-separated list of properties: SELECT, INSERT, UPDATE, DELETE. object-list is a comma-separated list of table and view names. user-list is either PUBLIC or a comma-separated list of user names.

The GRANT statement grants each privilege in privilege-list for each object (table) in object-list to each user in user-list. In general, the access privileges apply to all columns in the table or view, but it is possible to specify a column list with the UPDATE privilege specifier:

UPDATE [ ( column-1 [, column-2] ... ) ]

If the optional column list is specified, UPDATE privileges are granted for those columns only.

The user-list may specify PUBLIC. This is a general grant, applying to all users (and future users) in the catalog.

Privileges granted are revoked with the REVOKE Statement.

The optional specificier WITH GRANT OPTION may follow user-list in the GRANT statement. WITH GRANT OPTION specifies that, in addition to access privileges, the privilege to grant those privileges to other users is granted.

GRANT Statement Examples

GRANT SELECT ON s,sp TO PUBLIC

GRANT SELECT,INSERT,UPDATE(color) ON p TO art,nan

GRANT SELECT ON supplied_parts TO sam WITH GRANT OPTION

REVOKE Statement

The REVOKE Statement revokes access privileges for database objects previously granted to other users. It has the following general format:

REVOKE privilege-list ON [TABLE] object-list FROM user-list

The REVOKE Statement revokes each privilege in privilege-list for each object (table) in object-list from each user in user-list. All privileges must have been previously granted.

The user-list may specify PUBLIC. This must apply to a previous GRANT TO PUBLIC.

REVOKE Statement Examples

REVOKE SELECT ON s,sp FROM PUBLIC

REVOKE SELECT,INSERT,UPDATE(color) ON p FROM art,nan



fuente: http://www.firstsql.com/ 

REVOKE SELECT ON supplied_parts FROM sam

SQL-Transaction Statements

SQL-Transaction Statements control transactions in database access. This subset of SQL is also called the Data Control Language for SQL (SQL DCL).

There are 2 SQL-Transaction Statements:

COMMIT Statement -- commit (make persistent) all changes for the current transaction
ROLLBACK Statement -- roll back (rescind) all changes for the current transaction

Transaction Overview

A database transaction is a larger unit that frames multiple SQL statements. A transaction ensures that the action of the framed statements is atomic with respect to recovery.

A SQL Modification Statement has limited effect. A given statement can only directly modify the contents of a single table (Referential Integrity effects may cause indirect modification of other tables.) The upshot is that operations which require modification of several tables must involve multiple modification statements. A classic example is a bank operation that transfers funds from one type of account to another, requiring updates to 2 tables. Transactions provide a way to group these multiple statements in one atomic unit.

In SQL92, there is no BEGIN TRANSACTION statement. A transaction begins with the execution of a SQL-Data statement when there is no current transaction. All subsequent SQL-Data statements until COMMIT or ROLLBACK become part of the transaction. Execution of a COMMIT Statement or ROLLBACK Statement completes the current transaction. A subsequent SQL-Data statement starts a new transaction.

In terms of direct effect on the database, it is the SQL Modification Statements that are the main consideration since they change data. The total set of changes to the database by the modification statements in a transaction are treated as an atomic unit through the actions of the transaction. The set of changes either:

Is made fully persistent in the database through the action of the COMMIT Statement, or
Has no persistent effect whatever on the database, through:
- the action of the ROLLBACK Statement,
- abnormal termination of the client requesting the transaction, or
- abnormal termination of the transaction by the DBMS. This may be an action by the system (deadlock resolution) or by an administrative agent, or it may be an abnormal termination of the DBMS itself. In the latter case, the DBMS must roll back any active transactions during recovery.

The DBMS must ensure that the effect of a transaction is not partial. All changes in a transaction must be made persistent, or no changes from the transaction must be made persistent.

Transaction Isolation

In most cases, transactions are executed under a client connection to the DBMS. Multiple client connections can initiate transactions at the same time. This is known as concurrent transactions.

In the relational model, each transaction is completely isolated from other active transactions. After initiation, a transaction can only see changes to the database made by transactions committed prior to starting the new transaction. Changes made by concurrent transactions are not seen by SQL DML query and modification statements. This is known as full isolation or Serializable transactions.

SQL92 defines Serializable for transactions. However, fully serialized transactions can impact performance. For this reason, SQL92 allows additional isolation modes that reduce the isolation between concurrent transactions. SQL92 defines 3 other isolation modes, but support by existing DBMSs is often incomplete and doesn't always match the SQL92 modes. Check the documentation of your DBMS for more details.

Transaction isolation controls the visibility of changes between transactions in different sessions (connections). It determines if queries in one session can see changes made by a transaction in another session. There are 4 levels of transaction isolation. The level providing the greatest isolation from other transactions is Serializable.

At transaction isolation level Serializable, a transaction is fully isolated from changes made by other sessions. Queries issued under Serializable transactions cannot see any subsequent changes, committed or not, from other transactions. The effect is the same as if transactions were serial, that is, each transaction completing before another one is begun.

At the opposite end of the spectrum is Read Uncommitted. It is the lowest level of isolation. With Read Uncommitted, a session can read (query) subsequent changes made by other sessions, either committed or uncommitted. Read uncommitted transactions have the following characteristics:

Dirty Read -- a session can read rows changed by transactions in other sessions that have not been committed. If the other session then rolls back its transaction, subsequent reads of the same rows will find column values returned to previous values, deleted rows reappearing and rows inserted by the other transaction missing.
Non-repeatable Read -- a session can read a row in a transaction. Another session then changes the row (UPDATE or DELETE) and commits its transaction. If the first session subsequently re-reads the row in the same transaction, it will see the change.
Phantoms -- a session can read a set of rows in a transaction that satisfies a search condition (which might be all rows). Another session then generates a row (INSERT) that satisfies the search condition and commits its transaction. If the first session subsequently repeats the search in the same transaction, it will see the new row.

The other transaction levels -- Read Committed, Repeatable Read and Serializable, will not read uncommitted changes. Dirty reads are not possible. The next level above Read Uncommitted is Read Committed, and the next above that is Repeatable Read.

In Read Committed isolation level, Dirty Reads are not possible, but Non-repeatable Reads and Phantoms are possible. In Repeatable Read isolation level, Dirty Reads and Non-repeatable Reads are not possible but Phantoms are. In Serializable, Dirty Reads, Non-repeatable Reads, and Phantoms are not possible.

The isolation provided by each transaction isolation level is summarized below:

	Dirty Reads	Non-repeatable Reads	Phantoms
Read Uncommitted	Y	Y	Y
Read Committed	N	Y	Y
Repeatable Read	N	N	Y
Serializable	N	N	N

Note: SQL92 defines the SET TRANSACTION statement to set the transaction isolation level for a session, but most DBMSs support a function/method in the Client API as an alternative.

SQL-Schema Statements in Transactions

The 3rd type of SQL Statements - SQL-Schema Statements, may participate in the transaction mechanism. SQL-Schema statements can either be:

included in a transaction along with SQL-Data statements,
required to be in separate transactions, or
ignored by the transaction mechanism (can't be rolled back).

SQL92 leaves the choice up to the individual DBMS. It is implementation defined behavior.

COMMIT Statement

The COMMIT Statement terminates the current transaction and makes all changes under the transaction persistent. It commits the changes to the database. The COMMIT statement has the following general format:

COMMIT [WORK]

WORK is an optional keyword that does not change the semantics of COMMIT.

ROLLBACK Statement

The ROLLBACK Statement terminates the current transaction and rescinds all changes made under the transaction. It rolls back the changes to the database. The ROLLBACK statement has the following general format:

ROLLBACK [WORK]

WORK is an optional keyword that does not change the semantics of ROLLBACK.

fuente: http://www.firstsql.com/

jueves, marzo 12, 2009

SQL Modification Statements

INSERT Statement

The INSERT Statement adds one or more rows to a table. It has two formats:

INSERT INTO table-1 [(column-list)] VALUES (value-list)

and,

INSERT INTO table-1 [(column-list)] (query-specification)

The first form inserts a single row into table-1 and explicitly specifies the column values for the row. The second form uses the result of query-specification to insert one or more rows into table-1. The result rows from the query are the rows added to the insert table. Note: the query cannot reference table-1.

Both forms have an optional column-list specification. Only the columns listed will be assigned values. Unlisted columns are set to null, so unlisted columns must allow nulls. The values from the VALUES Clause (first form) or the columns from the query-specification rows (second form) are assigned to the corresponding column in column-list in order.

If the optional column-list is missing, the default column list is substituted. The default column list contains all columns in table-1 in the order they were declared in CREATE TABLE, or CREATE VIEW.

VALUES Clause

The VALUES Clause in the INSERT Statement provides a set of values to place in the columns of a new row. It has the following general format:

VALUES ( value-1 [, value-2] ... )

value-1 and value-2 are Literal Values or Scalar Expressions involving literals. They can also specify NULL.

The values list in the VALUES clause must match the explicit or implicit column list for INSERT in degree (number of items). They must also match the data type of corresponding column or be convertible to that data type.

INSERT Examples

INSERT INTO p (pno, color) VALUES ('P4', 'Brown')

Before

After

pno	descr	color
P1	Widget	Blue
P2	Widget	Red
P3	Dongle	Green

=>

pno	descr	color
P1	Widget	Blue
P2	Widget	Red
P3	Dongle	Green
P4	NULL	Brown

INSERT INTO sp
SELECT s.sno, p.pno, 500
FROM s, p
WHERE p.color='Green' AND s.city='London'

Before

After

sno	pno	qty
S1	P1	NULL
S2	P1	200
S3	P1	1000
S3	P2	200

=>

sno	pno	qty
S1	P1	NULL
S2	P1	200
S3	P1	1000
S3	P2	200
S2	P3	500

UPDATE Statement

The UPDATE statement modifies columns in selected table rows. It has the following general format:

UPDATE table-1 SET set-list [WHERE predicate]

The optional WHERE Clause has the same format as in the SELECT Statement. See WHERE Clause. The WHERE clause chooses which table rows to update. If it is missing, all rows are in table-1 are updated.

The set-list contains assignments of new values for selected columns. See SET Clause.

The SET Clause expressions and WHERE Clause predicate can contain subqueries, but the subqueries cannot reference table-1. This prevents situations where results are dependent on the order of processing.

SET Clause

The SET Clause in the UPDATE Statement updates (assigns new value to) columns in the selected table rows. It has the following general format:

SET column-1 = value-1 [, column-2 = value-2] ...

column-1 and column-2 are columns in the Update table. value-1 and value-2 are expressions that can reference columns from the update table. They also can be the keyword -- NULL, to set the column to null.

Since the assignment expressions can reference columns from the current row, the expressions are evaluated first. After the values of all Set expressions have been computed, they are then assigned to the referenced columns. This avoids results dependent on the order of processing.

UPDATE Examples

UPDATE sp SET qty = qty + 20

Before

After

sno	pno	qty
S1	P1	NULL
S2	P1	200
S3	P1	1000
S3	P2	200

=>

sno	pno	qty
S1	P1	NULL
S2	P1	220
S3	P1	1020
S3	P2	220

UPDATE s
SET name = 'Tony', city = 'Milan'
WHERE sno = 'S3'

Before

After

sno	name	city
S1	Pierre	Paris
S2	John	London
S3	Mario	Rome

=>

sno	name	city
S1	Pierre	Paris
S2	John	London
S3	Tony	Milan

DELETE Statement

The DELETE Statement removes selected rows from a table. It has the following general format:

DELETE FROM table-1 [WHERE predicate]

The optional WHERE Clause has the same format as in the SELECT Statement.
The WHERE clause chooses which table rows to delete. If it is missing, all rows are in table-1 are removed.

The WHERE Clause predicate can contain subqueries, but the subqueries cannot reference table-1. This prevents situations where results are dependent on the order of processing.

DELETE Examples

DELETE FROM sp WHERE pno = 'P1'

Before

After

sno	pno	qty
S1	P1	NULL
S2	P1	200
S3	P1	1000
S3	P2	200

=>

sno	pno	qty
S3	P2	200

DELETE FROM p WHERE pno NOT IN (SELECT pno FROM sp)

Before

After

pno	descr	color
P1	Widget	Blue
P2	Widget	Red
P3	Dongle	Green

=>

pno	descr	color
P1	Widget	Blue
P2	Widget	Red

fuente: http://www.firstsql.com/

SQL : Extended Query Capabilities

ORDER BY Clause

The ORDER BY clause is optional. If used, it must be the last clause in the SELECT statement. The ORDER BY clause requests sorting for the results of a query.

When the ORDER BY clause is missing, the result rows from a query have no defined order (they are unordered). The ORDER BY clause defines the ordering of rows based on columns from the SELECT clause. The ORDER BY clause has the following general format:

ORDER BY column-1 [ASC|DESC] [ column-2 [ASC|DESC] ] ...

column-1, column-2, ... are column names specified (or implied) in the select list. If a select column is renamed (given a new name in the select entry), the new name is used in the ORDER BY list. ASC and DESC request ascending or descending sort for a column. ASC is the default.

ORDER BY sorts rows using the ordering columns in left-to-right, major-to-minor order. The rows are sorted first on the first column name in the list. If there are any duplicate values for the first column, the duplicates are sorted on the second column (within the first column sort) in the Order By list, and so on. There is no defined inner ordering for rows that have duplicate values for all Order By columns.

Database nulls require special processing in ORDER BY. A null column sorts higher than all regular values; this is reversed for DESC.

In sorting, nulls are considered duplicates of each other for ORDER BY. Sorting on hidden information makes no sense in utilizing the results of a query. This is also why SQL only allows select list columns in ORDER BY.

For convenience when using expressions in the select list, select items can be specified by number (starting with 1). Names and numbers can be intermixed.

Example queries:

SELECT * FROM sp ORDER BY 3 DESC

sno	pno	qty
S1	P1	NULL
S3	P1	1000
S3	P2	200
S2	P1	200

SELECT name, city FROM s ORDER BY name

name	city
John	London
Mario	Rome
Pierre	Paris

SELECT * FROM sp ORDER BY qty DESC, sno

sno	pno	qty
S1	P1	NULL
S3	P1	1000
S2	P1	200
S3	P2	200

Expressions

In the previous subsection on basic Select statements, column values are used in the select list and where predicate. SQL allows a scalar value expression to be used instead. A SQL value expression can be a:

Literal -- quoted string, numeric value, datetime value
Function Call -- reference to builtin SQL function
System Value -- current date, current user, ...
Special Construct -- CAST, COALESCE, CASE
Numeric or String Operator -- combining sub-expressions

Literals

A literal is a typed value that is self-defining. SQL supports 3 types of literals:

String -- ASCII text framed by single quotes ('). Within a literal, a single quote is represented by 2 single quotes ('').
Numeric -- numeric digits (at least 1) with an optional decimal point and exponent. The format is
Numeric literals with no exponent or decimal point are typed as Integer. Those with a decimal point but no exponent are typed as Decimal. Those with an exponent are typed as Float.
Datetime -- datetime literals begin with a keyword identifying the type, followed by a string literal:
- Date -- DATE 'yyyy-mm-dd'
- Time -- TIME 'hh:mm:ss[.fff]'
- Timestamp -- TIMESTAMP 'yyyy-mm-dd hh:mm:ss[.fff]'
- Interval -- INTERVAL [+|-] string interval-qualifier
The format of the string in the Interval literal depends on the interval qualifier. For year-month intervals, the format is: 'dd[-dd]'. For day-time intervals, the format is '[dd ]dd[:dd[:dd]][.fff]'.

SQL Functions

SQL has the following builtin functions:

SUBSTRING(exp-1 FROM exp-2 [FOR exp-3])
Extracts a substring from a string - exp-1, beginning at the integer value - exp-2, for the length of the integer value - exp-3. exp-2 is 1 relative. If FOR exp-3 is omitted, the length of the remaining string is used. Returns the substring.
UPPER(exp-1)
Converts any lowercase characters in a string - exp-1 to uppercase. Returns the converted string.
LOWER(exp-1)
Converts any uppercase characters in a string - exp-1 to lowercase. Returns the converted string.
TRIM([LEADING|TRAILING|BOTH] [FROM] exp-1)
TRIM([LEADING|TRAILING|BOTH] exp-2 FROM exp-1)
Trims leading, trailing or both characters from a string - exp-1. The trim character is a space, or if exp-2 is specified, it supplies the trim character. If LEADING, TRAILING, BOTH are missing, the default is BOTH. Returns the trimmed string.
POSITION(exp-1 IN exp-2)
Searches a string - exp-2, for a match on a substring - exp-2. Returns an integer, the 1 relative position of the match or 0 for no match.
CHAR_LENGTH(exp-1)
CHARACTER_LENGTH(exp-1)
Returns the integer number of characters in the string - exp-1.
OCTET_LENGTH(exp-1)
Returns the integer number of octets (8-bit bytes) needed to represent the string - exp-1.
EXTRACT(sub-field FROM exp-1)
Returns the numeric sub-field extracted from a datetime value - exp-1. sub-field is YEAR, QUARTER, MONTH, DAY, HOUR, MINUTE, SECOND, TIMEZONE_HOUR or TIMEZONE_MINUTE. TIMEZONE_HOUR and TIMEZONE_MINUTE extract sub-fields from the Timezone portion of exp-1. QUARTER is (MONTH-1)/4+1.

System Values

SQL System Values are reserved names used to access builtin values:

USER -- returns a string with the current SQL authorization identifier.
CURRENT_USER -- same as USER.
SESSION_USER -- returns a string with the current SQL session authorization identifier.
SYSTEM_USER -- returns a string with the current operating system user.
CURRENT_DATE -- returns a Date value for the current system date.
CURRENT_TIME -- returns a Time value for the current system time.
CURRENT_TIMESTAMP -- returns a Timestamp value for the current system timestamp.

SQL Special Constructs

SQL supports a set of special expression constructs:

CAST(exp-1 AS data-type)
Converts the value - exp-1, into the specified date-type. Returns the converted value.
COALESCE(exp-1, exp-2 [, exp-3] ...)
Returns exp-1 if it is not null, otherwise returns exp-2 if it is not null, otherwise returns exp-3, and so on. Returns null if all values are null.
CASE exp-1 { WHEN exp-2 THEN exp-3 } ... [ELSE exp-4] END
CASE { WHEN predicate-1 THEN exp-3 } ... [ELSE exp-4] END
The first form of the CASE construct compares exp-1 to exp-2 in each WHEN clause. If a match is found, CASE returns exp-3 from the corresponding THEN clause. If no matches are found, it returns exp-4 from the ELSE clause or null if the ELSE clause is omitted.
The second form of the CASE construct evaluates predicate-1 in each WHEN clause. If the predicate is true, CASE returns exp-3 from the corresponding THEN clause. If no predicates evaluate to true, it returns exp-4 from the ELSE clause or null if the ELSE clause is omitted.

Expression Operators

Expression operators combine 2 subexpressions to calculate a value. There are 2 basic types -- numeric and string.

String Operators
There is just one string operator - ||, for string concatenation. Both operands of || must be strings. The operator concatenates the second string to the end of the first. For example,
Numeric operators
The numeric operators are common to most languages:
- + addition
- - subtraction
- * multiplication
- / division
All numeric operators can be used on the standard numeric data types:
- Integer -- TINYINT, SMALLINT, INT, BIGINT
- Exact -- NUMERIC, DECIMAL
- Approximate -- FLOAT, DOUBLE, REAL
Automatic conversion is provided for numeric operators. If an integer type is combined with an exact type, the integer is converted to exact before the operation. If an exact (or integer) type is combined with an approximate type, it is converted to approximate before the operation.
The + and - operators can also be used as unary operators.
The numeric operators can be applied to datetime values, with some restrictions. The basic rules for datetime expressions are:
- A date, time, timestamp value can be added to an interval; result is a date, time, timestamp value.
- An interval value can be subtracted from a date, time, timestamp value; result is a date, time, timestamp value.
- An interval value can be added to or subtracted from another interval; result is an interval value.
- An interval can be multiplied by or divided by a standard numeric value; result is an interval value.
A special form can be used to subtract a date, time, timestamp value from another date, time, timestamp value to yield an interval value:
The interval-qualifier specifies the specific interval type for the result.
A second special form allows a ? parameter to be typed as an interval:

In expressions, parentheses are used for grouping.

Joining Tables

The FROM clause allows more than 1 table in its list, however simply listing more than one table will very rarely produce the expected results. The rows from one table must be correlated with the rows of the others. This correlation is known as joining.

An example can best illustrate the rationale behind joins. The following query:

SELECT * FROM sp, p

Produces:

sno	pno	qty	pno	descr	color
S1	P1	NULL	P1	Widget	Blue
S1	P1	NULL	P2	Widget	Red
S1	P1	NULL	P3	Dongle	Green
S2	P1	200	P1	Widget	Blue
S2	P1	200	P2	Widget	Red
S2	P1	200	P3	Dongle	Green
S3	P1	1000	P1	Widget	Blue
S3	P1	1000	P2	Widget	Red
S3	P1	1000	P3	Dongle	Green
S3	P2	200	P1	Widget	Blue
S3	P2	200	P2	Widget	Red
S3	P2	200	P3	Dongle	Green

Each row in sp is arbitrarily combined with each row in p, giving 12 result rows (4 rows in sp X 3 rows in p.) This is known as a cartesian product.

A more usable query would correlate the rows from sp with rows from p, for instance matching on the common column -- pno:

SELECT *
FROM sp, p
WHERE sp.pno = p.pno

This produces:

sno	pno	qty	pno	descr	color
S1	P1	NULL	P1	Widget	Blue
S2	P1	200	P1	Widget	Blue
S3	P1	1000	P1	Widget	Blue
S3	P2	200	P2	Widget	Red

Rows for each part in p are combined with rows in sp for the same part by matching on part number (pno). In this query, the WHERE Clause provides the join predicate, matching pno from p with pno from sp.

The join in this example is known as an inner equi-join. equi meaning that the join predicate uses = (equals) to match the join columns. Other types of joins use different comparison operators. For example, a query might use a greater-than join.

The term inner means only rows that match are included. Rows in the first table that have no matching rows in the second table are excluded and vice versa (in the above join, the row in p with pno P3 is not included in the result.) An outer join includes unmatched rows in the result.

More than 2 tables can participate in a join. This is basically just an extension of a 2 table join. 3 tables -- a, b, c, might be joined in various ways:

a joins b which joins c
a joins b and the join of a and b joins c
a joins b and a joins c

Plus several other variations. With inner joins, this structure is not explicit. It is implicit in the nature of the join predicates. With outer joins, it is explicit;

This query performs a 3 table join:

SELECT name, qty, descr, color
FROM s, sp, p
WHERE s.sno = sp.sno
AND sp.pno = p.pno

It joins s to sp and sp to p, producing:

name	qty	descr	color
Pierre	NULL	Widget	Blue
John	200	Widget	Blue
Mario	1000	Widget	Blue
Mario	200	Widget	Red

Note that the order of tables listed in the FROM clause should have no significance, nor does the order of join predicates in the WHERE clause.

Outer Joins

An inner join excludes rows from either table that don't have a matching row in the other table. An outer join provides the ability to include unmatched rows in the query results. The outer join combines the unmatched row in one of the tables with an artificial row for the other table. This artificial row has all columns set to null.

The outer join is specified in the FROM clause and has the following general format:

table-1 { LEFT | RIGHT | FULL } OUTER JOIN table-2 ON predicate-1

predicate-1 is a join predicate for the outer join. It can only reference columns from the joined tables. The LEFT, RIGHT or FULL specifiers give the type of join:

LEFT -- only unmatched rows from the left side table (table-1) are retained
RIGHT -- only unmatched rows from the right side table (table-2) are retained
FULL -- unmatched rows from both tables (table-1 and table-2) are retained

Outer join example:

SELECT pno, descr, color, sno, qty
FROM p LEFT OUTER JOIN sp ON p.pno = sp.pno

pno	descr	color	sno	qty
P1	Widget	Blue	S1	NULL
P1	Widget	Blue	S2	200
P1	Widget	Blue	S3	1000
P2	Widget	Red	S3	200
P3	Dongle	Green	NULL	NULL

Self Joins

A query can join a table to itself. Self joins have a number of real world uses. For example, a self join can determine which parts have more than one supplier:

SELECT DISTINCT a.pno
FROM sp a, sp b
WHERE a.pno = b.pno
AND a.sno <> b.sno

pno
P1

As illustrated in the above example, self joins use correlation names to distinguish columns in the select list and where predicate. In this case, the references to the same table are renamed - a and b.

Self joins are often used in subqueries.

Subqueries

Subqueries are an identifying feature of SQL. It is called Structured Query Language because a query can nest inside another query.

There are 3 basic types of subqueries in SQL:

Predicate Subqueries -- extended logical constructs in the WHERE (and HAVING) clause.
Scalar Subqueries -- standalone queries that return a single value; they can be used anywhere a scalar value is used.
Table Subqueries -- queries nested in the FROM clause.

All subqueries must be enclosed in parentheses.

Predicate Subqueries

Predicate subqueries are used in the WHERE (and HAVING) clause. Each is a special logical construct. Except for EXISTS, predicate subqueries must retrieve one column (in their select list.)

IN Subquery
The IN Subquery tests whether a scalar value matches the single query column value in any subquery result row. It has the following general format:
Using NOT is equivalent to:
For example, to list parts that have suppliers:
The Self Join example in the previous subsection can be expressed with an IN Subquery:
Note that the subquery where clause references a column in the outer query (a.sno). This is known as an outer reference. Subqueries with outer references are sometimes known as correlated subqueries.
Quantified Subqueries
A quantified subquery allows several types of tests and can use the full set of comparison operators. It has the following general format:
The comparison operator specifies how to compare value-1 to the single query column value from each subquery result row. The ANY, ALL, SOME specifiers give the type of match expected. ANY and SOME must match at least one row in the subquery. ALL must match all rows in the subquery, or the subquery must be empty (produce no rows).
For example, to list all parts that have suppliers:
A self join is used to list the supplier with the highest quantity of each part (ignoring null quantities):
EXISTS Subqueries
The EXISTS Subquery tests whether a subquery retrieves at least one row, that is, whether a qualifying row exists. It has the following general format
Any valid EXISTS subquery must contain an outer reference. It must be a correlated subquery.
Note: the select list in the EXISTS subquery is not actually used in evaluating the EXISTS, so it can contain any valid select list (though * is normally used).
To list parts that have suppliers:

sno	pno	qty
S3	P1	1000
S3	P2	200

Scalar Subqueries

The Scalar Subquery can be used anywhere a value can be used. The subquery must reference just one column in the select list. It must also retrieve no more than one row.

When the subquery returns a single row, the value of the single select list column becomes the value of the Scalar Subquery. When the subquery returns no rows, a database null is used as the result of the subquery. Should the subquery retreive more than one row, it is a run-time error and aborts query execution.

A Scalar Subquery can appear as a scalar value in the select list and where predicate of an another query. The following query on the sp table uses a Scalar Subquery in the select list to retrieve the supplier city associated with the supplier number (sno column in sp):

SELECT pno, qty, (SELECT city FROM s WHERE s.sno = sp.sno)
FROM sp

pno	qty	city
P1	NULL	Paris
P1	200	London
P1	1000	Rome
P2	200	Rome

The next query on the sp table uses a Scalar Subquery in the where clause to match parts on the color associated with the part number (pno column in sp):

SELECT *
FROM sp
WHERE 'Blue' = (SELECT color FROM p WHERE p.pno = sp.pno)

sno	pno	qty
S1	P1	NULL
S2	P1	200
S3	P1	1000

Note that both example queries use outer references. This is normal in Scalar Subqueries. Often, Scalar Subqueries are Aggregate Queries.

Table Subqueries

Table Subqueries are queries used in the FROM clause, replacing a table name. Basically, the result set of the Table Subquery acts like a base table in the from list. Table Subqueries can have a correlation name in the from list. They can also be in outer joins.

The following two queries produce the same result:

SELECT p.*, qty
FROM p, sp
WHERE p.pno = sp.pno
AND sno = 'S3'

pno	descr	color	qty
P1	Widget	Blue	1000
P2	Widget	Red	200

SELECT p.*, qty
FROM p, (SELECT pno, qty FROM sp WHERE sno = 'S3')
WHERE p.pno = sp.pno

pno	descr	color	qty
P1	Widget	Blue	1000
P2	Widget	Red	200

Grouping Queries

A Grouping Query is a special type of query that groups and summarizes rows. It uses the GROUP BY Clause.

A Grouping Query groups rows based on common values in a set of grouping columns. Rows with the same values for the grouping columns are placed in distinct groups. Each group is treated as a single row in the query result.

Even though a group is treated as a single row, the underlying rows can be subject to summary operations known as Set Functions whose results can be included in the query. The optional HAVING Clause supports filtering for group rows in the same manner as the WHERE clause filters FROM rows.

For example, grouping the sp table on the pno column produces 2 groups:

sno	pno	qty
S1	P1	NULL	'P1' Group
S2	P1	200
S3	P1	1000
S3	P2	200	'P2' Group

The P1 group contains 3 sp rows with pno='P1'
The P2 group contains a single sp row with pno='P2'

Nulls get special treatment by GROUP BY. GROUP BY considers a null as distinct from every other null. Each row that has a null in one of its grouping columns forms a separate group.

Grouping the sp table on the qty column produces 3 groups:

sno	pno	qty
S1	P1	NULL	NULL Group
S2	P1	200	200 Group
S3	P2	200	200 Group
S3	P1	1000	1000 Group

The row where qty is null forms a separate group.

GROUP BY Clause

GROUP BY is an optional clause in a query. It follows the WHERE clause or the FROM clause if the WHERE clause is missing. A query containing a GROUP BY clause is a Grouping Query. The GROUP BY clause has the following general format:

GROUP BY column-1 [, column-2] ...

column-1 and column-2 are the grouping columns. They must be names of columns from tables in the FROM clause; they can't be expressions.

GROUP BY operates on the rows from the FROM clause as filtered by the WHERE clause. It collects the rows into groups based on common values in the grouping columns. Except nulls, rows with the same set of values for the grouping columns are placed in the same group. If any grouping column for a row contains a null, the row is given its own group.

For example,

SELECT pno
FROM sp
GROUP BY pno

pno
P1
P2

In Grouping Queries, the select list can only contain grouping columns, plus literals, outer references and expression involving these elements. Non-grouping columns from the underlying FROM tables cannot be referenced directly. However, non-grouping columns can be used in the select list as arguments to Set Functions. Set Functions summarize columns from the underlying rows of a group.

Set Functions

Set Functions are special summarizing functions used with Grouping Queries and Aggregate Queries. They summarize columns from the underlying rows of a group or aggregate.

Using the Group By example from above, grouping the sp table on the pno column:

sno	pno	qty
S1	P1	NULL	'P1' Group
S2	P1	200
S3	P1	1000
S3	P2	200	'P2' Group

A Set Function can compute the total quantities for each group:

sno	pno	qty		qty total
S1	P1	NULL	'P1' Group	1200
S2	P1	200
S3	P1	1000
S3	P2	200	'P2' Group	200

Null columns are ignored in computing the summary. The Set Function -- SUM, computes the arithmetic sum of a numeric column in a set of grouped/aggregate rows. For example,

SELECT pno, SUM(qty)
FROM sp
GROUP BY pno

pno
P1	1200
P2	200

Set Functions have the following general format:

set-function ( [DISTINCT|ALL] column-1 )

set-function is:

COUNT -- count of rows
SUM -- arithmetic sum of numeric column
AVG -- arithmetic average of numeric column; should be SUM()/COUNT().
MIN -- minimum value found in column
MAX -- maximum value found in column

The result of the COUNT function is always integer. The result of all other Set Functions is the same data type as the argument.

The Set Functions skip columns with nulls, summarizing non-null values. COUNT counts rows with non-null values, AVG averages non-null values, and so on. COUNT returns 0 when no non-null column values are found; the other functions return null when there are no values to summarize.

A Set Function argument can be a column or an scalar expression.

The DISTINCT and ALL specifiers are optional. ALL specifies that all non-null values are summarized; it is the default. DISTINCT specifies that distinct column values are summarized; duplicate values are skipped. Note: DISTINCT has no effect on MIN and MAX results.

COUNT also has an alternate format:

COUNT(*)

... which counts the underlying rows regardless of column contents.

Set Function examples:

SELECT pno, MIN(sno), MAX(qty), AVG(qty), COUNT(DISTINCT sno)
FROM sp
GROUP BY pno

pno
P1	S1	1000	600	3
P2	S3	200	200	1

SELECT sno, COUNT(*) parts
FROM sp
GROUP BY sno

sno	parts
S1	1
S2	1
S3	2

HAVING Clause

The HAVING Clause is associated with Grouping Queries and Aggregate Queries. It is optional in both cases. In Grouping Queries, it follows the GROUP BY clause. In Aggregate Queries, HAVING follows the WHERE clause or the FROM clause if the WHERE clause is missing.

The HAVING Clause has the following general format:

HAVING predicate

Like the WHERE Clause, HAVING filters the query result rows. WHERE filters the rows from the FROM clause. HAVING filters the grouped rows (from the GROUP BY clause) or the aggregate row (for Aggregate Queries).

predicate is a logical expression referencing grouped columns and set functions. It has the same restrictions as the select list for Grouping Queries and Aggregate Queries.

If the Having predicate evaluates to true for a grouped or aggregate row, the row is included in the query result, otherwise, the row is skipped (not included in the query result).

For example,

SELECT sno, COUNT(*) parts
FROM sp
GROUP BY sno
HAVING COUNT(*) > 1

sno	parts
S3	2

Aggregate Queries

An Aggregate Query can use Set Functions and a HAVING Clause. It is similar to a Grouping Query except there are no grouping columns. The underlying rows from the FROM and WHERE clauses are grouped into a single aggregate row. An Aggregate Query always returns a single row, except when the Having clause is used.

An Aggregate Query is a query containing Set Functions in the select list but no GROUP BY clause. The Set Functions operate on the columns of the underlying rows of the single aggregate row. Except for outer references, any columns used in the select list must be arguments to Set Functions.

An aggregate query may also have a Having clause. The Having clause filters the single aggregate row. If the Having predicate evaluates to true, the query result contains the aggregate row. Otherwise, the query result contains no rows.

For example,

SELECT COUNT(DISTINCT pno) number_parts, SUM(qty) total_parts
FROM sp

number_parts	total_parts
2	1400

Subqueries are often Aggregate Queries. For example, parts with suppliers:

SELECT *
FROM p
WHERE (SELECT COUNT(*) FROM sp WHERE sp.pno=p.pno) > 0

pno	descr	color
P1	Widget	Blue
P2	Widget	Red

Parts with multiple suppliers:

SELECT *
FROM p
WHERE (SELECT COUNT(DISTINCT sno) FROM sp WHERE sp.pno=p.pno) > 1

pno	descr	color
P1	Widget	Blue

Union Queries

The SQL UNION operator combines the results of two queries into a composite result. The component queries can be SELECT/FROM queries with optional WHERE/GROUP BY/HAVING clauses. The UNION operator has the following general format:

query-1 UNION [ALL] query-2

query-1 and query-2 are full query specifications. The UNION operator creates a new query result that includes rows from each component query.

By default, UNION eliminates duplicate rows in its composite results. The optional ALL specifier requests that duplicates be retained in the UNION result.

The component queries of a Union Query can also be Union Queries themselves. Parentheses are used for grouping queries.

The select lists from the component queries must be union-compatible. They must match in degree (number of columns). For Entry Level SQL92, the column descriptor (data type and precision, scale) for each corresponding column must match. The rules for Intermediate Level SQL92 are less restrictive.

Union-Compatible Queries

For Entry Level SQL92, each corresponding column of both queries must have the same column descriptor in order for two queries to be union-compatible. The rules are less restrictive for Intermediate Level SQL92. It supports automatic conversion within type categories. In general, the resulting data type will be the broader type. The corresponding columns need only be in the same data type category:

Character (String) -- fixed/variable length
Bit String -- fixed/variable length
Exact Numeric (fixed point) -- integer/decimal
Approximate Numeric (floating point) -- float/double
Datetime -- sub-category must be the same,
- Date
- Time
- Timestamp
Interval -- sub-category must be the same,
- Year-month
- Day-time

UNION Examples

SELECT * FROM sp
UNION
SELECT CAST(' ' AS VARCHAR(5)), pno, CAST(0 AS INT)
FROM p
WHERE pno NOT IN (SELECT pno FROM sp)

sno	pno	qty
S1	P1	NULL
S2	P1	200
S3	P1	1000
S3	P2	200
	P3	0