Why shouldn't one select max (primary key column) instead of using an id column or sequence?

Question

Anonymous · Answer

The most important reason is that two clients could both select max(primary_key)+1 at almost the same exact instant, both get the same result, and both try to use the same value in their subsequent insert statement. One will execute their insert first, and then the other will fail, because they're trying to insert a primary key value that now exists in the table. This is called a race condition.

To avoid this, you would have to do the following steps for every insert:

Lock the entire table
Select max(primary_key)+1
Insert new row
Release your table lock (maybe not until the end of your transaction

To avoid this, you would have to do the following steps for every insert:
1. Lock the entire table
2. Select max(primary_key)+1
3. Insert new row
4. Release your table lock (maybe not until the end of your transaction)

In an environment where you want multiple concurrent clients inserting rows rapidly, this keeps the table locked for too long. Clients queue up against each other, waiting for the table lock. You end up having a bottleneck in your application.

Auto-increment mechanisms work differently:
1. Lock the auto-increment generation object
2. Get the next id
3. Release the auto-increment lock
4. Insert new row using the id your thread just generated

The auto-increment generator is also a single resource that the threads are contending for, but the usage of it is extremely brief, and is released immediately after the id is generated, instead of persisting until the end of the transaction.

Using auto-increment features allows for greater scalability -- i.e. more concurrent clients inserting rows to the same table without queueing unnecessarily.

You said your superior doesn't think there will be a lot of users inserting rows. But it doesn't take a lot of users, it only takes two -- if they're close together. There's an old saying about the likelihood of rare occurrences: one in a million is next Tuesday [ http://blogs.msdn.com/b/larryosterman/archive/2004/03/30/104165.aspx ].

Besides, you haven't described any legitimate reason not to use an auto-increment.

Bob Nightingale · Answer

Quora User answered the question. But I’ll pile on anyway.

I just had this same argument on a project. We’re removing a parameters table that kept “counters” that will be better served by using sequences. We were getting occasional locking when two users were updating the parameters table. Another thing we found out is that the values only had to be unique—gaps in sequences were not fatal because users never saw these keys.

One caution of using sequences is their value never rolls backwards. So if you insert a row where the PK value was 10 and you got that 10 from a sequence in the transaction, then you roll back the transaction, and do the insert again with PK value from a sequence, its value is an 11.

Identity columns are new with Oracle 12.1. Before that you would have to use a sequence. This AskTom  [ https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9538315800346986467 ]article discusses the two approaches.

For my application, the identity column doesn’t make sense. I need to get a new primary key value before I insert a row, and then use that same value in child tables to insert foreign key values. I could run these processes in parallel and not worry who won the race.

Anonymous · Answer

Why is naming a table's Primary Key column “Id” considered bad practice?

Because it will become a bear trap for the naive developer.

In my (limited) experience, it is rare that an app only has one table. So are you going to always name the Primary Key in all tables “ID”?

You do that, and I’ll bet you a beer (and/or your job) that very soon you will dozily write some SQL which uses the wrong ID field from the wrong table, or perhaps in your code you confuse the ID for table A with the ID for table B, and as a result screw the data rigid. Great way to create bugs that are a pain to diagnose.

So in your Owners table call it Owner_ID, in your Buildings table call it Building_ID, in your Invoices table call it Invoice_ID. You get the idea.

Prevent confusion. There’s too much of it in the world already.

ChatGPT · Answer

Selecting the maximum value of a primary key column instead of using a dedicated ID column or sequence can lead to several issues:
1. Concurrency Issues: In a multi-user environment, if two transactions try to insert a record at the same time, they might both read the same maximum value before either transaction commits. This can lead to duplicate primary keys, which violates the uniqueness constraint of primary keys.
2. Performance: Using [code]MAX()[/code] requires scanning the entire table to find the highest value, which can be inefficient, especially for large tables. In contrast, using an auto-incrementing ID or a sequence is typically much faster because the database can simply increment the last value without needing to scan the table.
3. Complexity: Relying on [code]MAX()[/code] adds complexity to your code. You need to handle potential race conditions and ensure that the logic for inserting new records is robust. Using an ID column or sequence simplifies this process by providing a guaranteed unique value.
4. Data Integrity: If records are deleted, the maximum value may not reflect the last inserted ID, leading to potential confusion and integrity issues. An auto-incrementing ID or sequence always generates a new unique value for each insertion.
5. Portability: Not all database systems handle [code]MAX()[/code] in the same way, which can lead to portability issues if you decide to switch databases. Using a standard ID column or sequence is more universally supported.
Conclusion
Using a dedicated ID column or sequence is generally the best practice for generating unique identifiers in databases. It ensures uniqueness, improves performance, simplifies code, and maintains data integrity.

Barry McConnell · Answer

It can be but it's not the best choice for a couple reasons. First is that it only guarantees uniqueness within that single instance. Two use cases immediately come to mind that screw that up. First is the merger/acquisition of another company and needing to integrate both systems into one. I guarantee if they both used auto numbers for PKs, you're going to have a lot of work to do to integrate the data. Second is the need to distribute the data across multiple servers. You don't want the Tokyo system to run transactions against the server in Denver so you set up a local database and merge the data later into the master system. Oops, duplicate PKs.

Another reason not to use them is human's inherent need to impose meaning on recognizable patterns. Who gets to be employee #1? Why is there a big gap in the sequence? What happens when you run out of numbers? We know that the number is supposed to be meaningless but good luck enforcing that with sequential numbers.

So what should you use? Well, definitely NOT user entered data like business keys. The first rule of PKs is they should never change, ever. Data entered by humans WILL have errors that need correcting.

The best solution is a GUID. Globally unique and highly unlikely to be duplicated and even if it is, the volume will be so low it will be easy to remedy across systems. No discernable patterns for humans to latch onto and definitely not something people will try to memorize even if they do see them.

Anonymous · Answer

Naming a primary key column [code ]id[/code] is not bad practice.

Naming all primary key columns [code ]id[/code] is bad practice.

Not all tables should have a single-column primary key with an auto-incrementing integer.

Some tables have a multi-column primary key. Every many-to-many table, for example.

[code]CREATE TABLE BooksAuthored (
  book_id INT NOT NULL,
  author_id INT NOT NULL,
  PRIMARY KEY (book_id, author_id)
);
[/code]Some tables have a “natural key” instead of an auto-incrementing key.

[code]CREATE TABLE States (
  state_abbr CHAR(2) PRIMARY KEY,
  state_name VARCHAR(20) NOT NULL
);
[/code]Forcing every table to have a superfluous primary key column [code ]id INT AUTO_INCREMENT[/code] even when it doesn’t need it was a habit popularized by “opinionated” frameworks like Ruby on Rails. Their theory was that forcing every table to follow a pattern makes some coding tasks more consistent. Nevertheless, there are cases where forcing that pattern is inappropriate and makes other coding tasks more complex and inefficient.

The bad practice isn’t using id sometimes — it’s the insistence on arbitrary rules even when they’re not helpful.

Greg Moore · Answer

Put me down as the almost always NO camp.

The problem with an auto-incremented column as Primary Key is honestly, it’s very easy to destroy database integrity with it and it can make testing a royal pain in the butt.

I’ve often seen people say, “well it’s good when you have no natural key and are just using it as some sort of lookup table.” Yeah, no.

I’ll give you an example loosely based on an issue I encountered in the real world. Client was storing some data, I believe it was colors being used in some sort of lookup table, so I’ll use that as an example.

So table looked something like:

[code]PK  Color  Usage
==  ====    ====
 1  Red     Foreground
 2  White   Background
 3  Yellow  Frame
[/code]So inserts looked something like

[code]Insert Color_LK (Color, Usage) Values ('Red','Foreground')
Insert Color_LK (Color, Usage) Values ('White','Background')
Insert Color_LK (Color, Usage) Values ('Yellow','Frame')
[/code]Easy peazy.

But imagine two scenarios:

The first: The inserts get run twice in production. Now you have duplicate rows, but they’re not duplicate because the Primary Key says they’re not. There’s nothing to keep someone from entering the same data multiple times. Or worse different data

[code]Insert Color_LK (Color, Usage) Values ('Blue','Foreground')
[/code]Now which is is the Foreground?

Ah, but Greg, obviously my key is 1 for the foreground so the Blue value (which would be associated with 4 won’t matter).

But, you don’t know that do you? Because perhaps in Dev, the developer did

[code]Insert Color_LK (Color, Usage) Values ('Red','Foreground')
Insert Color_LK (Color, Usage) Values ('White','Background')
Insert Color_LK (Color, Usage) Values ('Yellow','Frame')
[/code]But in UAT the UI person insisted that they change the Foreground color to Blue. The developer decides to update their original script, so they remove the first insert and add a new one.

[code]Insert Color_LK (Color, Usage) Values ('White','Background')
Insert Color_LK (Color, Usage) Values ('Yellow','Frame')
Insert Color_LK (Color, Usage) Values ('Blue','Foreground')
[/code]The UIX signs off on it and this gets deployed to production.

So now in Dev you have

[code]PK  Color  Usage
==  ====    ====
 1  Red     Foreground
 2  White   Background
 3  Yellow  Frame
[/code]But in UAT and Production

[code]PK  Color  Usage
==  ====    ====
 1  White   Background
 2  Yellow  Frame
 3 	Blue 	Foreground
[/code]Will the REAL value for Foreground please stand up.

Now, before you object and say, “that’ll never happen” I can tell you from years of experience I’ve seen this sort of thing happen all the time with lookup tables that use auto incrementing primary keys.

Or I’ve seen, even worse:

[code]PK  Color  Usage
==  ====    ====
 1  Red     Foreground
 2  White   Background
 3  Yellow  Frame
 4	Blue	Foreground
[/code]Which has its own problems!

So, the solution:

Personally, I’d probably just make the Usage Column the primary key.

But if for some reason you insist on using integers for your primary key (and there can be arguments for and against that) then don’t use auto-incrementing. Hand code in the value. i.e. Foreground WILL ALWAYS BE 1, Background WILL always be 2., etc.

So your inserts become

[code]Insert Color_LK (PK, Color, Usage) Values (1, 'Red','Foreground')
Insert Color_LK (PK, Color, Usage) Values (2, 'White','Background')
Insert Color_LK (PK, Color, Usage) Values (3, 'Yellow','Frame')
[/code]Now you insure that no matter what environment, 1 is always Foreground, etc.

Art Kagel · Answer

The real problem with having the primary key column (assuming that there is indeed a single column primary key) that is always named ‘id’ in every table is that when these keys are referenced in another table the name of the foreign key column in the dependent table will be different than the column that it references and I have a problem with that. I am a firm believer that an object’s name should be consistent in all contexts where possible (a table that has multiple foreign key references to the same table is an obvious “not possible” exception) and that each name should uniquely refer to only one object.

Call me rigid, but do I think that it is approaching insanity, and certainly provoking of confusion, when every primary key is just called ‘id’. I agree that calling the primary key of the person table person_id is a bit redundant, but that is what you will have to call it when you reference it as a foreign key, so just call it person_id in the person table as well!

I have the same objection to using generic column names like ‘account_num’ in multiple tables if the “account number” referred to in the general_ledger table is not the “account number” in the customer table is not the “account number” in the vendor table. When you reference those columns in other tables, you will either have to prepend the name with some prefix indicating which account number it refers to or there will be confusion because developers, for example, are not interested in parsing foreign key constraints! So I prefer “gl_account_num”, “cust_account_num”, and “vend_account_num” be used in the defining table and all references consistently when I design a database schema.

Karl Jørgensen · Answer

“Everyone has a unique name” ??

No they do not.

I have a (weird) namesake in the USA - apparently the guy has a fetish for inflatable reindeer. (no: I’m not making this up…). Has caused me some awkward moments in job interviews…

One good quality of primary keys is that they do not change. Changing the value of a primary key column is (relatively speaking) a lot of work: All references to it must change. And people change names all the time.

Also: Primary keys should not be excessively long. They should only be long “enough” to ensure uniqueness. And names can be long. Just look at this venerable person:

His name is Adolph Blaine Charles David Earl Frederick Gerald Hubert Irvin John Kenneth Lloyd Martin Nero Oliver Paul Quincy Randolph Sherman Thomas Uncas Victor William Xerxes Yancy Zeus Wolfeschlegel­steinhausen­bergerdorff­welche­vor­altern­waren­gewissenhaft­schafers­wessen­schafe­waren­wohl­gepflege­und­sorgfaltigkeit­beschutzen­vor­angreifen­durch­ihr­raubgierig­feinde­welche­vor­altern­zwolfhundert­tausend­jahres­voran­die­erscheinen­von­der­erste­erdemensch­der­raumschiff­genacht­mit­tungstein­und­sieben­iridium­elektrisch­motors­gebrauch­licht­als­sein­ursprung­von­kraft­gestart­sein­lange­fahrt­hinzwischen­sternartig­raum­auf­der­suchen­nachbarschaft­der­stern­welche­gehabt­bewohnbar­planeten­kreise­drehen­sich­und­wohin­der­neue­rasse­von­verstandig­menschlichkeit­konnte­fortpflanzen­und­sich­erfreuen­an­lebenslanglich­freude­und­ruhe­mit­nicht­ein­furcht­vor­angreifen­vor­anderer­intelligent­geschopfs­von­hinzwischen­sternartig­raum Sr. [ https://en.wikipedia.org/wiki/Hubert_Blaine_Wolfeschlegelsteinhausenbergerdorff_Sr. ] Even wikipedia refused to use the full name in the URL.

Many names can be spelled in numerous ways - e.g. many arabic names have multiple correct spellings in the latin alphabet.

Many people have multiple names. Not as in “first name”, “middle name” - but “different names they go by”. Quite legally.

Are you OK with primary keys in UTF-8? Because how else are you going to write محمد ?

And for some reason, you believe that everybody has a name!?

As a programmer, you must read - and understand Falsehoods Programmers Believe About Names [ https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ ]. It should open your eyes.

Art Kagel · Answer

This is the biggest religious debate in the database world. Some advocate that every table MUST use a surrogate key. Others that one MYST always use the natural key. I am more pragmatic myself.

A natural key is, well, natural to the data. I prefer to use the natural key when it makes sense. However when the natural key is a lengthy sequence of bytes, say if it is made up of multiple character columns, it can produce an index that is not ideal to use for detail-to-parent joins. Also using it as a foreign key can make related table rows wider than they might be. In those cases I would definitely go for adding an autoincrementing integer surrogate key as the primary key and make the natural key an alternate unique key.

Alan Mellor · Answer

Sometimes you can; if your application enforces unique usernames at sign up, then you might be able to use user name.

I still wouldn’t though. Use a surrogate key (like an integer from a sequence generator) and merely have the username as an indexed column you can look up (or a join).

The main problem with using any value from the real world is that even if you can guarantee uniqueness when the account is created, you cannot guarantee there will be no change.

It is very common for people to change surnames, especially on marriage for example.

Not only can this cause a conflict with an existing previously unique name, but it can wreak havoc with key relationships.

Just don’t do it.

Grant Fritchey · Answer

“Is it always recommended to have an auto-incremented column as the PRIMARY KEY in a database?”

The key word in the question is “always.”

The short answer is, no. It is not always recommended to have an auto-incremented column as the primary key.

For example, you have an interim table, a table for a many to many join. The natural key is the primary key from each of the other tables. Let’s assume, just for our argument here, that neither of those is a natural key. Instead, each of them is an artificial key, auto-incremented as we’re discussing. Those two columns make the PK on the interim table. You’re suggesting, for no reason, to add a second key, just so that we can “always” have a auto-increment key? No. Makes not sense. Adds overhead, and makes the whole design cumbersome, plus, unless other tables are related to the interim table, that artificial key will never be used for any queries. It exists to do nothing. You still have to enforce the unique keys between the two tables to ensure the many-to-many relationship isn’t broken as data gets removed from the system.

This isn’t even getting into the natural key versus artificial key debate. Myself, I lean hard towards artificial keys are better, for a number of reasons. However, as in the example above, natural keys have a reason to be used. Plus, even if you have an artificial key, you still must enforce the natural key or you’re looking at the potential for bad data.

Peter Zet · Answer

I always use auto numbering (auto increment) as a primary key.

This is why I prefer this:

1. It’s the most compact form, so it saves of data storage and performs better. The change of making mistakes when correcting something manually is also minimalised.
2. I always use the same name for the primary key field. When I have a file customer the keys name is customer_id. By doing this I never have to look up the fields name (and I already know the field type).
3. It’s a sequential way of numbering, so lower numbers are always older. This can be convenient when something went wrong from a certain moment etc.
I already use this method many years and never experienced any disadvantages. However there might be rare situations where looking for other solutions is a better idea.

Jack Lion Heart · Answer

The main disadvantage to this (that I'm aware of) is that all other indexes will reference the rows by your primary key, so you're increasing the size (and decreasing the block density) of all of your other indexes.

The other thing you want to be careful with is that it's generally preferable to write your rows in sequential/increasing primary key order -- and having lots of independent components to your primary key increases the chances that you're not doing that.

If either of these issues sounds like it may apply, you should consider adding an "id" column to your table that serves as your primary key, and adding a separate UNIQUE KEY index on what would have otherwise been your primary key.

However, if neither of these sounds like a concern, I think it's fine.

There's some more discussion here: 
http://www.mysqlperformanceblog.com/2006/10/03/long-primary-key-for-innodb-tables/

Joe Celko · Answer

No, just the opposite. Auto incrementing is how you avoid the relational model. You are mimicking a magnetic tape file! Did you ever read a book on RDBMS? Definition of the key is that it is a subset of attributes such that it is unique and not null for every row in a table. By absolute textbook definition, an auto increment can never be a key because it can never be a column. What attribute does it model? None! It has to do with physical storage, not a logical model.

This is also why we don't like to say primary key anymore. This declaration is a leftover from when SQL was built on magnetic ta

This is also why we don't like to say primary key anymore. This declaration is a leftover from when SQL was built on magnetic tape files and disk storage systems, not logical models.

Anonymous · Answer

Use a UUID when your primary key values are generated in a decentralized way. In other words, if multiple apps are creating data and assigning identifying keys, without any ability for the apps to sync with each other to avoid using the same primary key value.

Example: data is created on users's clients (browsers and mobile devices) before posting the data back to a website. They are doing this concurrently, and have no way of knowing how many other clients are doing the same thing. So if they each have their own idea of the current primary key value, and just trying to use the next higher value, they're sure to conflict.

Using a UUID makes the chances of conflict near zero, because each client is generating a new random UUID in a sufficiently large domain (128 bits) that the chances of two clients generating the same value is insignificant (at least for the scale of data we work on today, in the early 21st century).

The downsides include:
 * 128 bits is larger than 32 bits used for traditional integers, so the data requires more space. UUID's are often stored as strings of hex digits, so they take even more space.
 * The bulk of primary keys is repeated as other tables contain foreign keys referencing the UUID.
 * Inserting new data into a table randomly, instead of appending to the end, may be inefficient, depending on the implementation of the DBMS.
 * UUID's take longer to type, so doing ad hoc queries or mocking up test data during development is a PITA.

Grant Fritchey · Answer

Why?

I might make sense, but I need to understand why someone thinks it’s needed. Let’s take a different tack. You can add an IDENTITY property to a column. You can also have a SEQUENCE property on a column. You can have both. Or two SEQUENCE columns. Or three. Why? Why would you need that. One example might be, you want to expose a number for the people using the app, but, you, appropriately, don’t want the artificial primary key value to suddenly get meaning. So, instead, you add the SEQUENCE. Now there are two different counters, one with meaning, and one without.

Without the why, I can’t tell you whether or not it makes sense.

Asked to answer, but I really didn’t. Sorry.

Thomas Barkman · Answer

Absolutely, yes! You can set a primary key on character columns. Imagine you've got a table of your favorite books. The title of the book, a character column, could serve as your primary key. Every book has a unique title, right? Well, it's not purely black and white though.

While using primary keys on character columns is indeed possible, it's not always the best pick. There may be performance issues when dealing with large datasets. See, text comparison takes more time than integer comparison.

Also, remember the off chance that two books might have the same title? That's a bummer, mate! You've got yourself a duplicate key situation. That's why in many scenarios, using a unique identifier like an integer can be a safer bet.

But hey, don't let that get you down. If you're sure that your character values are distinct, go ahead and set that primary key. Just be mindful of the potential hiccups. Get creative with your databases, and keep the questions coming - you're learning heaps!

Art Kagel · Answer

No skilled database designer will EVER “ just %3Cuse%3E those columns as part of your primary key”!

One might create an index that started with the low cardinality column (say ‘gender’ or ‘marital_status’) that includes the primary key column(s) in addition, but that would be a separate index from the primary key constraint and its supporting index. The primary key, but its very definition, is the column or set of columns that uniquely identifies a record in a table. Adding an attribute that has nothing to do with identifying a row breaks that purpose.

Now, I would say that having an index solely on a low cardinality column is not necessarily a good thing, it is also not necessarily a bad thing either. Some systems can apply multiple indexes to filter a search or support a join and in those systems having several single column indexes, even on low cardinality columns, can improve performance of those searches and joins while also reducing storage consumption and improving insert, update, and delete performance as a bonus. IBM’s Informix, for example, can use multiple indexes for searches and also for data warehouse style queries in a Star or Snowflake schema designed database and such queries will perform better than the equivalent designs using compound key indexes instead.

Hugo Kornelis (he/him) · Answer

(Asked to answer)

First: Never concatenate columns. It's a violation of First Normal Form and it well cause huge pains later.

Second: A PRIMARY KEY (or in fact any candidate key) does not have to be single column. If it takes three columns to reach uniqueness, then you define a key on those three columns combined. This is called a composite key.

Third: There are extremely few cases where the data does not have what I call a “business key”, the data used in the business to distinguish between elements in the group. Just ask people in the business why they use to identify an individual customer, product, contact, etc. and you will find a good primary key candidate.

Fourth: There are cases where it is appropriate to create a surrogate key (unique values in a single column generated by the computer). Having a composite PRIMARY KEY can be one of those reasons. But it's called surrogate for a reason: it should be added to the table in addition to the business key and both should be declared as keys to enforce their uniqueness. The surrogate keys can then be used in other tables to implement foreign key relationships; these values are preferably not used externally. Surrogate keys are typically used to improve join performance or save space but at the cost of needing more joins for some reports.

So in short: never concatenate columns; find the business key; add a surrogate key when needed but not to replace the business key.