|
|
Just some thoughts of mine
-
The MERGE syntax I used here has changed. In 2008 RC0 the "WHEN SOURCE NOT MATCHED" clause is replaced with "WHEN NOT MATCHED BY SOURCE" which is supposed to make the meaning clearer.
Although MERGE is part of standard SQL 2003 it's worth noting that the SOURCE NOT MATCHED / NOT MATCHED BY SOURCE clause is not part of the standard. It's an extension added by Microsoft - a very useful one in my view.
MERGE INTO a USING b ON a.keycol = b.keycol WHEN MATCHED THEN UPDATE SET col1 = b.col1, col2 = b.col2, col3 = b.col3 WHEN NOT MATCHED THEN INSERT (keycol, col1, col2, col3) VALUES (b.keycol, b.col1, b.col2, b.col3) WHEN NOT MATCHED BY SOURCE THEN DELETE;
|
-
 SQL Server 2008 Release Candidate 0 (build 10.0.1442.32) was released to MSDN and TechNet subscribers on Friday. It should be available to everyone else very soon via the Connect site. http://msdn.microsoft.com/en-us/subscriptions/default.aspx http://technet.microsoft.com/en-us/subscriptions/default.aspx Apart from the new branding (see above) the goodies in this release include some Management Studio enhancements: object search, T-SQL debugger and PowerShell support. Yes, the debugger is back for those who miss it from SQL Server 2000 days! What's also new is that multiple editions are now included in a single package, so you get Express, Standard, Enterprise, etc instead of just Developer. A product key is required to install the paid-for editions. SQL Server Express Edition 64 bit I have seen no info at all from Microsoft about this but RC0 does include a 64 bit version of Express Edition. This is the first native 64 bit release of Express. SQL Server 2005 Express Edition is 32 bit and WOW only. SELECT @@VERSION; -------------------------------------------------------------------------------------------------------- Microsoft SQL Server 2008 (CTP) - 10.0.1442.32 (X64) May 29 2008 23:52:04 Copyright (c) 1988-2007 Microsoft Corporation Express Edition with Advanced Services (64-bit) on Windows NT 5.2 <X64> (Build 3790: Service Pack 2) (1 row(s) affected)
|
-
The 2008 Formula One season began yesterday with a pretty dramatic Australian Grand Prix in Melbourne. One of the innovations of this year's championship is the new standard Engine Control Unit (ECU), which removes driver aids like traction control and engine braking. Recently I've been working on a project with McLaren Electronic Systems (MES) - the electronics and control systems arm of the McLaren group. MES and Microsoft are official suppliers of the standard ECU now used in all the F1 cars. One of the important functions of the ECU is to log data from 100 or more sensors on the car. The data is broadcast by radio and then received and analysed in real-time by the engineers at the track side. A typical race generates between one and two Gigabytes of engineering data. Together with test results that adds up to around 2 or 3 terabytes of data per year. MES asked Conchango to help create a data platform for their ATLAS software - the application used by McLaren and other teams to receive and analyse data from the ECU. Previously the software was file-based but MES wanted to re-platform it, using SQL Server 2008 as the data store in place of the file system. Key to the solution we built was FILESTREAM. FILESTREAM is a new feature of SQL Server 2008 that supports fast streaming access to BLOB data within the database. FILESTREAM means we can store and retrieve the time-series data from the car at high speed. Client applications that need real-time streaming access to the data can continue to use a file-like API. Simultaneously, other applications such as Excel and Reporting Services can consume historical and near real-time information in query form. This has proved to be a great success story for a new feature of SQL Server 2008. Microsoft have published a case study on the project. The project will also feature at the launch event on Wednesday, March 19 at the ICC, Birmingham. I'll be attending the launch too.
|
-
According to InfoWorld, Bill Gates thinks that declarative modelling of business logic has not caught on because of "weak models". This is a surprise. E.F.Codd's earliest papers on the Relational Model described a relational calculus for the purpose of supporting declarative business logic. Even the SQL standards committee added some powerful features to standard SQL, such as CREATE ASSERTION and generalised CHECK constraints with subqueries. Despite this, InfoWorld identifies the Relational Model as one of the "weak" models referred to by Gates! What is stopping people putting declarative business logic in their data models today? The answer is not the model. It is the DBMS products that are supposed to implement the model. Here's what Terry Halpin said about declarative constraint support in SQL back in 2001 "Although the above SQL syntax [CREATE DOMAIN and CHECK constraint with subquery] is legal as far back as SQL-92, some commercial versions of SQL do not yet support all of this syntax. For data definition, some versions have barely progressed beyond the old SQL-89 standard (which had no domain clauses, and restricted check clauses to conditions on a single row). In practice, some features of a relational schema may need to be specified as a procedural code rather than declaratively."
(Information Modeling and Relational Databases, p411) In 2000, Fabian Pascal wrote: "most commercial DBMSs have not implemented the [SQL] standard's advanced level integrity features. Some implement integrity procedurally via user-written stored procedures expressed in proprietary SQL extensions that can be quite complex, vary across products and have various limitations."(Practical Issues in Database Management, p64) And here's my take on the same problem: http://blogs.conchango.com/davidportas/archive/2007/02/19/Trouble-with-CHECK-Constraints.aspxDatabase practitioners have been saying for years that most SQL DBMSs do a poor job of supporting declarative business logic. Dear Bill: the relational model is just great. Please fix the omissions in the current products before going on to invent whole new ones!
|
-
-
MERGE is a new DML statement in SQL Server 2008. Microsoft have implemented the ISO SQL 2003 and 2007 standard MERGE statement (as seen in Oracle and DB2) and added some extensions of their own. In a nutshell, MERGE allows you to perform simultaneous UPDATE, INSERT and/or DELETE operations on one table. There are new physical operators that combine these operations so that they can be performed in a single scan rather than multiple scans. MERGE has loads of possible applications. For the first time you can assign the contents of one table or query to another in a single operation. The following example requires SQL Server 2008 CTP4. Given this schema and data: CREATE TABLE a (keycol INT PRIMARY KEY, col1 INT NOT NULL, col2 INT NOT NULL, col3 INT NOT NULL); CREATE TABLE b (keycol INT PRIMARY KEY, col1 INT NOT NULL, col2 INT NOT NULL, col3 INT NOT NULL); INSERT INTO a VALUES (1,0,0,0),(2,0,0,0); INSERT INTO b VALUES (1,1,1,1),(3,3,3,3); The following MERGE will populate table a with the same data as table b: MERGE INTO a USING b ON a.keycol = b.keycol WHEN MATCHED THEN UPDATE SET col1 = b.col1, col2 = b.col2, col3 = b.col3 WHEN NOT MATCHED THEN INSERT (keycol, col1, col2, col3) VALUES (b.keycol, b.col1, b.col2, b.col3) WHEN SOURCE NOT MATCHED THEN DELETE; In the relational world this is the operation known as Relational Assignment ie: a := b Unfortunately the SQL syntax is less pretty and requires just a little more typing! MERGE also makes a good "upsert" for application CRUD stored procedures, removing the need for constructs like: IF NOT EXISTS ... INSERT ... Here's an example I created today. It inserts a new Vendor if and only if the name doesn't already exist. Whether the name previously existed or not, it returns the IDENTITY value of the existing or newly inserted row. CREATE PROC dbo.usp_VendorUpsert ( @pVendorID INT OUTPUT, @pVendorName VARCHAR(80) ) AS BEGIN SET NOCOUNT ON; MERGE dbo.Vendor t USING (SELECT @pVendorName ) p(VendorName) ON t.VendorName = @pVendorName WHEN NOT MATCHED THEN INSERT (VendorName) VALUES (@pVendorName) WHEN MATCHED THEN UPDATE SET @pVendorID = VendorID; SET @pVendorID = COALESCE(SCOPE_IDENTITY(),@pVendorID); END RETURN It's amazing that it took nearly 20 years for the SQL standards committee to come up with MERGE. Perhaps the delay is a legacy of the decision to make INSERT, UPDATE and DELETE the basic data update operators. INSERT, UPDATE and DELETE can all be defined as different kinds of relational assignment - assignment being the most basic type of update possible. So arguably MERGE is the more primitive and fundamental data update operator that ought to have been around earlier rather than later.
|
-
According to Computerworld, Mike Stonebraker says "RDBMSes, are 'long in the tooth' and 'should be considered legacy technology'". Knowing Stonebraker's work and reputation that seemed like some pretty startling information! It was less of a surprise to find that Computerworld hadn't quite got it right though... You can read Mike Stonebraker's actual words at www.databasecolumn.com. He makes it clear that he's talking about the technology used for current DBMS implementations, not about the relational model or the SQL model itself. The columnar technology that Stonebraker has championed for years isn't an alternative data model at all; it's a particular mode of storage that in principle could be used by any DBMS, whether relational or not. As far as I know it has been used pretty much exclusively in SQL DBMSs (Sybase IQ for example). Column store is perfect for BI and analytics. What would be even nicer would be to have the choice of column store or row store (or both) in a single DBMS. In fact I'd like to discard altogether the assumption made by most major DBMSs - that a base table automatically defines the main unit of storage. It ought to be possible to map any selection, projection or join of tables into their own storage and to create redundant storage structures where necessary to support the best optimisations for different queries. Today, we have to use three different structures to achieve those things: partitions are a means of storing selections, indexes store projections, and "materialised" views are a slightly roundabout means of persisting joins (why do we have to create a view just in order to change the internal type of storage?). Each of those techniques have their own particular advantages and disadvantages.
|
-
One of my regular gripes about SQL Server concerns the number of Transact SQL features that give nondeterministic results. Unfortunately SQL Server 2005 added to the list of those. One of the new ones is ROW_NUMBER(), which is strictly due to ANSI/ISO rather than Microsoft. ROW_NUMBER() is inherently non-deterministic unless the PARTITION / ORDER BY criteria define a unique criteria for each row - something that isn't mandatory. What I think is surprising however, is Microsoft's approach to updateable views and CTE's, which gives an interesting twist to the effect of ROW_NUMBER(). For example, ROW_NUMBER() can be used to solve the problem of removing duplicates from a set and it's undeniably useful for that purpose. Here's an apparently simple example. The goal is to delete all but one row for each distinct value of Z. CREATE TABLE tbl (x INT NOT NULL PRIMARY KEY, z CHAR(1) NOT NULL); INSERT INTO tbl (x,z) VALUES (1,'A'); INSERT INTO tbl (x,z) VALUES (2,'A'); INSERT INTO tbl (x,z) VALUES (3,'B'); INSERT INTO tbl (x,z) VALUES (4,'B'); WITH t(r) AS /* Updateable CTE */ (SELECT ROW_NUMBER() OVER (PARTITION BY z ORDER BY z) FROM tbl) DELETE t WHERE r > 1; The problem is: which rows get deleted? The CTE doesn't contain a key, in fact it doesn't contain anything except the ROW_NUMBER() in column "r". The ORDER BY clause doesn't reference a key either which means the number returned for each row is nondeterministic. Yet despite these facts the CTE is still updateable. I was surprised when I first discovered that the DELETE statement above is legal. I was even more surprised to find that Books Online is totally silent on the point of exactly when a CTE is updateable and when it isn't. The lack of documentation is a serious omission because those who have been using SQL Server for a few years will know that undefined behaviour tends to get dropped or changed between releases. Right now, my habit is that if I need to update a CTE then I make sure it contains a key. Usually it makes sense for the OVER... ORDER BY clause to use a key also. Like: WITH t(x,r) AS /* Updateable CTE */ (SELECT x, ROW_NUMBER() OVER (PARTITION BY z ORDER BY x) FROM tbl) DELETE t WHERE r > 1; (note the difference: SELECT x, ... ORDER BY x).
|
-
-
Alice: What do you mean by practical languages? Riccardo: select from where. Alice: That's it? Vittorio: Well, there are of course lots of bells and whistles. Sergio: But basically, this forms the core of most practical languages. [1] Oracle is celebrating its 30th birthday this month. They didn't adopt Oracle as the company name until 1982, before which they were known as Relational Software Inc. Of course their very excellent, landmark database product is also "relational" by name and reputation but strictly non-relational in fact. All SQL DBMSs borrow certain features and terminology of the relational model while remaining fundamentally anti-relational in many important respects[2]. IBM began by declaring that its SQL language was in fact "relational" and the name has stuck.
Once or twice I have floated the idea that a new relational database language should be added to SQL Server as an alternative to the T-SQL language. In my opinion the time is right for it. SQL Server already has several data access languages and layers of abstraction - such as SSAS/MDX, LINQ, SSIS - so adding another one isn't exactly a radical idea. What I'm talking about would be a relational language that lives up to Date and Darwen's description of a "D" language. Naturally the new language would imply a new data model as well - the relational data model instead of the SQL one - but I don't think it would be difficult to build it on top of SQL Server's existing engine. In fact one of the key benefits would be that the engine could be freed of certain restrictions imposed by SQL (for example the requirement to support duplicates and column order). Implementing a new language would be a much more modest step than it was to add the OLAP Services engine into SQL Server 7.0 back in 1998. I was recently asked for some examples of problems solved by a true RDBMS that aren't solved already by SQL Server (or other SQL DBMSs for that matter). A full list would be pretty long! Among other things an RDBMS would deliver a truly unified model suitable for both OLTP and OLAP applications. Performance and storage improvements due to better data independence. Support for declarative constraint logic. A more powerful and generic type system. A relational language would also fix some of the more irritating limitations that waste so much SQL programming time today. Duplicate rows and the consequent failure to support guaranteed access. No relational assignment or comparison operators. No multiple assignment. Three-value logic. Is it going to happen? I don't know, but I do think the combination of today's business demands and today's technology make it a real possibility - perhaps more so than at any time in the last 30 years. SQL was a data access language for the 1980's but a 21st century replacement is well overdue. Comments and feeback are welcome! -- [1] From "Foundations of Databases", Chapter 7, by Abiteboul, Hull and Vianu. Addison-Wesley [2] There are numerous references that cover the details. I can recommend "Database in Depth" by Chris Date. "The Askew Wall" by Hugh Darwen, republished in "Relational Database Writings 1989-1991" explains the origins of SQL and some of its limitations.
|
-
Microsoft have officially announced their intention to have Katmai available in 2008. (The title of this post is nothing more than idle speculation. Katmai doesn't have an official name yet.) http://www.microsoft.com/presspass/press/2007/may07/05-09KatmaiPR.mspx Among the features announced are the Declarative Management Framework, the Resource Governor and also support for geographical data, documents and images in the database. Declarative management will allow you to manage your SQL Server configuration across many databases and servers by defining policy rules that are automatically applied, montiored and enforced. It's a great feature for large data centres and complex environments. It's also unlike anything I have seen from the other DBMS vendors. The Resource Governor is a very welcome and overdue improvement. It means that for the first time you will be able to allocate and limit SQL Server resources to support individual workloads with different priorities and needs running concurrently on the same server. That's going to help deliver more consistent performance and availability from every SQL Server instance - for example by making sure that ad-hoc decision support queries can't impact your operational systems by starving them of resources. The new data type support is potentially exciting too. I've been asked more than a few times about support for spatial data within SQL Server. It's something that DB2 and Oracle already offer. You can read more at: http://www.microsoft.com/sql/prodinfo/futureversion/default.mspx And more here as more information becomes available...
|
-
Last week I was at the MVP Summit in Seattle and Redmond, mainly learning about "Katmai" - the next version of SQL Server - from the SQL Server development team. The details of Katmai aren't in the public domain yet but some of the key themes were named a year ago by Paul Flessner: Continuous Availability and Automation; Beyond Relational; End-To-End Insight. One of the things that's obvious to everyone on the MVP programme is just how much effort Microsoft puts into listening to customers and acting on the feedback it receives. The MVP Summit is a fine example of that. It's an occasion when some of the leading lights of the SQL Server world outside Microsoft come together and share ideas with the people actually developing the product. I hope I'll have more to say about Katmai in the future, as and when I'm able to talk about actual features. Meantime, there's more from the summit here: http://www.sqlmag.com/Article/ArticleID/95547/sql_server_95547.html http://www.microsoft.com/presspass/exec/billg/speeches/2007/03-13MVPSummit.mspx
|
-
I blogged here about a problem with SQL Server check constraints and Jamie Collins made a comment that Jet also suffers from the same problem. He goes on to say: "The main drawback for me, however, is that Jet (in common with SQL Server) lacks the S[Q]L-92 feature of deferrable constraints which I feel is a must have to make CHECK constraints workable." I'm glad someone mentioned this because it gives me an opportunity to say why I disagree with it! It seems that ANSI/ISO SQL's concept of deferrable constraints is poor. I don't believe that concept is necessary to make constraints workable and I hope that support for it won't be added to SQL Server any time soon (but thank you for your comment Jamie). In SQL terms, a deferred constraint isn't checked until a transaction is committed. If the constraint is not met at commit time then the transaction is rolled back. That means multiple statements in a transaction can violate deferred constraints without error and execution will continue as normal until the commit statement is reached. But the results of any offending statements are persisted in the database for at least the duration of the transaction and therefore may affect subsequent statements in the same transaction scope. In other words, within the transaction scope the database will appear to be in an inconsistent state. This can cause very similar problems to the one I already demonstrated in SQL Server. Both cases can give rise to incorrect results and in a sense the deferred constraint situation is more serious because it allows update DML statements to happen rather than just erroneously preventing a valid result.* In the example that I referred to in my previous post no deferred constraint checking is necessary. All that is required is for SQL Server to check the constraint properly - i.e. at statement level rather than row level. PRIMARY KEY, FOREIGN KEY and UNIQUE constraints are already checked at statement level; CHECK constraints should be too. The actual use-case for deferrable constraints is different. Deferrable constraints enable SQL to support transactions that require multiple statements in order to return the database to a consistent state. For example, tables A and B may be constrained to have the same number of rows: CREATE ASSERTION foo CHECK ( (SELECT COUNT(*) FROM a) = (SELECT COUNT(*) FROM b) ); How do you then insert rows to both tables, given that an INSERT statement can only target a single table at once? What I would like to see is a better alternative to SQL's "deferred" syntax. I'd propose the option to have a transaction treated as a single unit of work, so that both the constraints and the DML take effect only at commit time - exactly as if the whole transaction were a single statement. The advantage is that the constraint need never be allowed to be violated. Example: BEGIN DEFERRED TRAN; PRINT 'a'; INSERT INTO a VALUES (1); PRINT 'b'; INSERT INTO b VALUES (1); COMMIT TRAN; Result: a b (2 row(s) affected) Notice that using this hypothetical syntax, DEFERRED is a property of the transaction, not the constraint(s). For sure there are some obstacles to making this a reality but I still believe it is preferable to SQL's deferrable constraint model. Another possible solution would be to add support for multiple assignment in SQL. The subject of just whether multiple assigment (or even single assignment!) is supportable within the SQL data model and syntax is worthy of a whole essay in itself. I'm not going to attempt it right now... *Note: The above is influenced by Chris Date who examines this problem in his book, Database In Depth (in Chapter 6, "Why Database Constraint Checking Must Be Immediate"). He proposes a solution which is syntactically different to what I am suggesting but in my opinion the effect is the same.
|
-
In response to Jamie Thomson's T-SQL poser, here's one very efficient method of concatenation in an aggregate query. It uses the FOR XML clause to turn a result set into a string. SELECT DISTINCT id, SUBSTRING( (SELECT ','+name AS [text()] FROM t1 WHERE id = T.id ORDER BY name FOR XML PATH( '' ) ), 2,100) AS concat FROM t1 AS T; This solution is due to Itzik Ben-Gan and others and is described in Inside SQL Server 2005. The main disadvantage is that special characters such as "<" will get delimited with XML-style quoting in the result.
|
-
ANSI/ISO SQL defines CHECK constraints very simply as:
<check constraint definition> ::= CHECK <left paren> <search condition> <right paren> where <search condition> is a boolean expression, including query expressions. In SQL Server, Oracle and many other DBMSs, queries aren't permitted in constraints. CHECK constraints can only reference the table against which they are defined and they can only refer to column names as if they are scalar values - i.e. the values for a single row at a time. This isn't good enough. Often we want to implement business rules that refer to more than one table or to selections or aggregations of rows from the table in question. For example, the rule "Ensure that sum of orders cannot exceed the customer's credit limit" or "Ensure that total X = total Y". In the past, triggers have been one common solution for these types of requirement. But when SQL Server 2000 introduced user-defined functions, some people tried another way to get around the limitations of CHECK constraints: put a query in a scalar function, then reference the function in a CHECK constraint. I've already mentioned something that hints at the flaw in such a solution. SQL Server's CHECK constraints were designed to be evaluated a single row at a time rather than in set-based fashion. This can create big problems if your CHECK constraint relies on a query and if more than one row is affected by any update. What happens is that the constraint gets checked once for each row before the statement completes. That means statement atomicity is broken and the function will be exposed to the database in an inconsistent state. The results are unpredicable and inaccurate. Here's an example. The goal is to define a constraint that guarantees the Accounts total will always balance to zero: CREATE FUNCTION dbo.Accounts_total () RETURNS INT AS BEGIN DECLARE @r INT; SET @r = (SELECT SUM(Amount) FROM dbo.Accounts); RETURN @r; END GO CREATE TABLE dbo.Accounts (AccountNo INT PRIMARY KEY, Amount NUMERIC(10,2) NOT NULL); INSERT INTO dbo.Accounts (AccountNo,Amount) SELECT 1,-50 UNION ALL SELECT 2,50 ; ALTER TABLE dbo.Accounts ADD CONSTRAINT ck_Accounts CHECK (dbo.Accounts_total()=0) /* balance must = 0 */; UPDATE dbo.Accounts SET Amount = -Amount ; INSERT INTO dbo.Accounts (AccountNo,Amount) SELECT 3,-100 UNION ALL SELECT 4,100 ; SELECT * FROM Accounts; Try this both with and without the constraint to get an idea of what's happening. You should see that the UPDATE and the second INSERT statement will cause a constraint violation error even though in logical terms the business rule is never violated (I tested it under SQL Server 2005, SP1). You should also find that this is unaffected by the transaction isolation level because the flaw occurs at row level rather than statement level. To some extent it is possible to write queries that are safe in functions within CHECK constraints but the potential restrictions are fairly severe. In purely relational terms this is obviously a very serious flaw even if it's not actually a bug from Microsoft's point of view.
|
|
|
|