![]() The MERGE is a very powerful command and would also allow you to implement in the same command also the upsert (insert-or-update) behavior. The logic behind the scenes is the same, but the code is much leaner in my opinion: merge into That may sound a little verbose and quite complicated for such a simple operation, so you’ll be happy to know that all that code can be simplified a lot using the MERGE statement. The INSERT statement will do exactly what it says: insert rows into the tags table, if any. The WHERE clause will make sure only those rows that’s doesn’t already exists in the target table – tags – will be returned from the virtual table and passed to the INSERT statement. By allowing the engine to know that, the internal mechanism of lock conversion can be optimized to guarantee the best concurrency and consistency. The UPDLOCK is a hint to tell Azure SQL that we are reading with the goal to update the row. Data types will be automatically inferred if you want to have some specific data type, you can always CAST the value to make sure the data type you want will be used. The virtual table will be called s and will have two columns: post_id and tag. If you need more rows then JSON, Table Valued Parameters or Bulk Insert are a better choice ). One or more rows can be created with that technique (it works very nicely up to a few hundred rows. The first SELECT will create a virtual table with the data we want to insert. Using such table as example, an INSERT.SELECT to implement the insert-if-not-exists logic would look like: insert into. (Ĭonstraint pk_tags primary key clustered (, ) ![]() The table would look like the following: create table. A tag can be used in different posts, but only once per post. Let’s say we have a table, named tags, that stores all the tags associated with a blogs post. One command, without any explicit transaction. Does it start to ring a bell?īy using an INSERT.SELECT command, we can achieve exactly what is needed. With Azure SQL, doing that is easy: you can INSERT a row into a table using the result of a SELECT on that table. What if, for example, we can do both steps in just one command? No need for big transaction, no need for less-then-optimal loops. If data was changed, you simply restart from step 1 and loop continuously until you manage to complete the algorithm or you reach the maximum number of attempts allowed. This approach uses a resource version token – for example, an ETag – to make sure that data didn’t change between the first and the second step. ![]() How to solve this problem elegantly and without having such big transactions? One option is use what is defined as “Optimistic Concurrency”. ![]() While this will work perfectly from a functional perspective, if you are in a highly concurrent system and you need to have a very scalable solution where an extreme number of executions can be done in parallel, you really want to have such protection around your transaction – the isolation – for the shortest time possible, as for as long as you are using that data in your algorithm, other may have to wait for accessing it, to avoid interfering with your activity. ![]() Everything works and changes will be persisted, or nothing works and changes will be undone. This behavior is also known as atomicity: the two steps are indivisible. You need a transaction, that will provide the needed isolation so that interference won’t happen.Īs the chosen algorithm is comprised of two separate steps, you must create a transaction big enough to keep both steps under the same umbrella, so that the two physical separate steps will logically work like one. As you can easily understand, in fact, if you have done the first action, you really need to be sure that in-scope data doesn’t change before you can do the second action, otherwise the result may be wrong or inconsistent. The moment you have two actions, where one depends on another, you need to make sure that the data used by both doesn’t change in the meantime because of some other action done by someone else. The problem, in fact, lies in the algorithm itself. This approach has a flaw, whatever the database you are using and no matter the database if relational or not. Many developers will solve it by trying to execute two steps: Generalizing the problem, it can be described as the requirement of insert some data into a table only if that data is not there already. Recently I found a quite common request on StackOverflow. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |