Sort
Profile photo for Quora User

The most important reason is that two clients could both select max(primary_key)+1 at almost the same exact instant, both get the same result, and both try to use the same value in their subsequent insert statement. One will execute their insert first, and then the other will fail, because they're trying to insert a primary key value that now exists in the table. This is called a race condition.

To avoid this, you would have to do the following steps for every insert:

  1. Lock the entire table
  2. Select max(primary_key)+1
  3. Insert new row
  4. Release your table lock (maybe not until the end of your transaction

The most important reason is that two clients could both select max(primary_key)+1 at almost the same exact instant, both get the same result, and both try to use the same value in their subsequent insert statement. One will execute their insert first, and then the other will fail, because they're trying to insert a primary key value that now exists in the table. This is called a race condition.

To avoid this, you would have to do the following steps for every insert:

  1. Lock the entire table
  2. Select max(primary_key)+1
  3. Insert new row
  4. Release your table lock (maybe not until the end of your transaction)


In an environment where you want multiple concurrent clients inserting rows rapidly, this keeps the table locked for too long. Clients queue up against each other, waiting for the table lock. You end up having a bottleneck in your application.

Auto-increment mechanisms work differently:

  1. Lock the auto-increment generation object
  2. Get the next id
  3. Release the auto-increment lock
  4. Insert new row using the id your thread just generated


The auto-increment generator is also a single resource that the threads are contending for, but the usage of it is extremely brief, and is released immediately after the id is generated, instead of persisting until the end of the transaction.

Using auto-increment features allows for greater scalability -- i.e. more concurrent clients inserting rows to the same table without queueing unnecessarily.

You said your superior doesn't think there will be a lot of users inserting rows. But it doesn't take a lot of users, it only takes two -- if they're close together. There's an old saying about the likelihood of rare occurrences: one in a million is next Tuesday.

Besides, you haven't described any legitimate reason not to use an auto-increment.

Profile photo for Bob Nightingale

Quora User answered the question. But I’ll pile on anyway.

I just had this same argument on a project. We’re removing a parameters table that kept “counters” that will be better served by using sequences. We were getting occasional locking when two users were updating the parameters table. Another thing we found out is that the values only had to be unique—gaps in sequences were not fatal because users never saw these keys.

One caution of using sequences is their value never rolls backwards. So if you insert a row where the PK value was 10 and you got that 10 from a sequence in the transaction,

Quora User answered the question. But I’ll pile on anyway.

I just had this same argument on a project. We’re removing a parameters table that kept “counters” that will be better served by using sequences. We were getting occasional locking when two users were updating the parameters table. Another thing we found out is that the values only had to be unique—gaps in sequences were not fatal because users never saw these keys.

One caution of using sequences is their value never rolls backwards. So if you insert a row where the PK value was 10 and you got that 10 from a sequence in the transaction, then you roll back the transaction, and do the insert again with PK value from a sequence, its value is an 11.

Identity columns are new with Oracle 12.1. Before that you would have to use a sequence. This AskTom article discusses the two approaches.

For my application, the identity column doesn’t make sense. I need to get a new primary key value before I insert a row, and then use that same value in child tables to insert foreign key values. I could run these processes in parallel and not worry who won the race.

Where do I start?

I’m a huge financial nerd, and have spent an embarrassing amount of time talking to people about their money habits.

Here are the biggest mistakes people are making and how to fix them:

Not having a separate high interest savings account

Having a separate account allows you to see the results of all your hard work and keep your money separate so you're less tempted to spend it.

Plus with rates above 5.00%, the interest you can earn compared to most banks really adds up.

Here is a list of the top savings accounts available today. Deposit $5 before moving on because this is one of th

Where do I start?

I’m a huge financial nerd, and have spent an embarrassing amount of time talking to people about their money habits.

Here are the biggest mistakes people are making and how to fix them:

Not having a separate high interest savings account

Having a separate account allows you to see the results of all your hard work and keep your money separate so you're less tempted to spend it.

Plus with rates above 5.00%, the interest you can earn compared to most banks really adds up.

Here is a list of the top savings accounts available today. Deposit $5 before moving on because this is one of the biggest mistakes and easiest ones to fix.

Overpaying on car insurance

You’ve heard it a million times before, but the average American family still overspends by $417/year on car insurance.

If you’ve been with the same insurer for years, chances are you are one of them.

Pull up Coverage.com, a free site that will compare prices for you, answer the questions on the page, and it will show you how much you could be saving.

That’s it. You’ll likely be saving a bunch of money. Here’s a link to give it a try.

Consistently being in debt

If you’ve got $10K+ in debt (credit cards…medical bills…anything really) you could use a debt relief program and potentially reduce by over 20%.

Here’s how to see if you qualify:

Head over to this Debt Relief comparison website here, then simply answer the questions to see if you qualify.

It’s as simple as that. You’ll likely end up paying less than you owed before and you could be debt free in as little as 2 years.

Missing out on free money to invest

It’s no secret that millionaires love investing, but for the rest of us, it can seem out of reach.

Times have changed. There are a number of investing platforms that will give you a bonus to open an account and get started. All you have to do is open the account and invest at least $25, and you could get up to $1000 in bonus.

Pretty sweet deal right? Here is a link to some of the best options.

Having bad credit

A low credit score can come back to bite you in so many ways in the future.

From that next rental application to getting approved for any type of loan or credit card, if you have a bad history with credit, the good news is you can fix it.

Head over to BankRate.com and answer a few questions to see if you qualify. It only takes a few minutes and could save you from a major upset down the line.

How to get started

Hope this helps! Here are the links to get started:

Have a separate savings account
Stop overpaying for car insurance
Finally get out of debt
Start investing with a free bonus
Fix your credit

Profile photo for Quora User

Why is naming a table's Primary Key column “Id” considered bad practice?

Because it will become a bear trap for the naive developer.

In my (limited) experience, it is rare that an app only has one table. So are you going to always name the Primary Key in all tables “ID”?

You do that, and I’ll bet you a beer (and/or your job) that very soon you will dozily write some SQL which uses the wrong ID field from the wrong table, or perhaps in your code you confuse the ID for table A with the ID for table B, and as a result screw the data rigid. Great way to create bugs that are a pain to diagnose.

So in y

Why is naming a table's Primary Key column “Id” considered bad practice?

Because it will become a bear trap for the naive developer.

In my (limited) experience, it is rare that an app only has one table. So are you going to always name the Primary Key in all tables “ID”?

You do that, and I’ll bet you a beer (and/or your job) that very soon you will dozily write some SQL which uses the wrong ID field from the wrong table, or perhaps in your code you confuse the ID for table A with the ID for table B, and as a result screw the data rigid. Great way to create bugs that are a pain to diagnose.

So in your Owners table call it Owner_ID, in your Buildings table call it Building_ID, in your Invoices table call it Invoice_ID. You get the idea.

Prevent confusion. There’s too much of it in the world already.

Profile photo for Assistant
Assistant

Selecting the maximum value of a primary key column instead of using a dedicated ID column or sequence can lead to several issues:

  1. Concurrency Issues: In a multi-user environment, if two transactions try to insert a record at the same time, they might both read the same maximum value before either transaction commits. This can lead to duplicate primary keys, which violates the uniqueness constraint of primary keys.
  2. Performance: Using MAX() requires scanning the entire table to find the highest value, which can be inefficient, especially for large tables. In contrast, using an auto-incrementing I

Selecting the maximum value of a primary key column instead of using a dedicated ID column or sequence can lead to several issues:

  1. Concurrency Issues: In a multi-user environment, if two transactions try to insert a record at the same time, they might both read the same maximum value before either transaction commits. This can lead to duplicate primary keys, which violates the uniqueness constraint of primary keys.
  2. Performance: Using MAX() requires scanning the entire table to find the highest value, which can be inefficient, especially for large tables. In contrast, using an auto-incrementing ID or a sequence is typically much faster because the database can simply increment the last value without needing to scan the table.
  3. Complexity: Relying on MAX() adds complexity to your code. You need to handle potential race conditions and ensure that the logic for inserting new records is robust. Using an ID column or sequence simplifies this process by providing a guaranteed unique value.
  4. Data Integrity: If records are deleted, the maximum value may not reflect the last inserted ID, leading to potential confusion and integrity issues. An auto-incrementing ID or sequence always generates a new unique value for each insertion.
  5. Portability: Not all database systems handle MAX() in the same way, which can lead to portability issues if you decide to switch databases. Using a standard ID column or sequence is more universally supported.

Conclusion

Using a dedicated ID column or sequence is generally the best practice for generating unique identifiers in databases. It ensures uniqueness, improves performance, simplifies code, and maintains data integrity.

Profile photo for Barry McConnell

It can be but it's not the best choice for a couple reasons. First is that it only guarantees uniqueness within that single instance. Two use cases immediately come to mind that screw that up. First is the merger/acquisition of another company and needing to integrate both systems into one. I guarantee if they both used auto numbers for PKs, you're going to have a lot of work to do to integrate the data. Second is the need to distribute the data across multiple servers. You don't want the Tokyo system to run transactions against the server in Denver so you set up a local database and merge the

It can be but it's not the best choice for a couple reasons. First is that it only guarantees uniqueness within that single instance. Two use cases immediately come to mind that screw that up. First is the merger/acquisition of another company and needing to integrate both systems into one. I guarantee if they both used auto numbers for PKs, you're going to have a lot of work to do to integrate the data. Second is the need to distribute the data across multiple servers. You don't want the Tokyo system to run transactions against the server in Denver so you set up a local database and merge the data later into the master system. Oops, duplicate PKs.

Another reason not to use them is human's inherent need to impose meaning on recognizable patterns. Who gets to be employee #1? Why is there a big gap in the sequence? What happens when you run out of numbers? We know that the number is supposed to be meaningless but good luck enforcing that with sequential numbers.

So what should you use? Well, definitely NOT user entered data like business keys. The first rule of PKs is they should never change, ever. Data entered by humans WILL have errors that need correcting.

The best solution is a GUID. Globally unique and highly unlikely to be duplicated and even if it is, the volume will be so low it will be easy to remedy across systems. No discernable patterns for humans to latch onto and definitely not something people will try to memorize even if they do see them.

Profile photo for Marc Hammes

Like many of you reading this, I’ve been looking for ways to earn money online in addition to my part-time job. But you know how it is – the internet is full of scams and shady-grady stuff, so I spent weeks trying to find something legit. And I finally did!

Freecash surprised me in all the right ways. I’ve earned over $1,000 in one month without ‘living’ on the platform. I was skeptical right up until the moment I cashed out to my PayPal.

What is Freecash all about?

Basically, it’s a platform that pays you for testing apps and games and completing surveys. This helps developers improve their appl

Like many of you reading this, I’ve been looking for ways to earn money online in addition to my part-time job. But you know how it is – the internet is full of scams and shady-grady stuff, so I spent weeks trying to find something legit. And I finally did!

Freecash surprised me in all the right ways. I’ve earned over $1,000 in one month without ‘living’ on the platform. I was skeptical right up until the moment I cashed out to my PayPal.

What is Freecash all about?

Basically, it’s a platform that pays you for testing apps and games and completing surveys. This helps developers improve their applications while you make some money.

  • You can earn by downloading apps, testing games, or completing surveys. I love playing games, so that’s where most of my earnings came from (oh, and my favorites were Warpath, Wild Fish, and Domino Dreams).
  • There’s a variety of offers (usually, the higher-paying ones take more time).
  • Some games can pay up to $1,000 for completing a task, but these typically require more hours to finish.
  • On average, you can easily earn $30–50/day.
  • You pick your options — you’re free to choose whatever apps, games, and surveys you like.

Of course, it’s not like you can spend 5 minutes a day and become a millionaire. But you can build a stable income in reasonable time, especially if you turn it into a daily habit.

Why did I like Freecash?

  • It’s easy. I mean it. You don’t have to do anything complicated. All you need is to follow the task and have some free time to spend on it. For some reason, I especially enjoyed the game Domino Dreams. My initial goal was to complete chapter 10 to get my first $30, but I couldn’t stop playing and ended up completing chapter 15. It was lots of fun and also free money: $400 from that game alone.
  • No experience needed. Even if you’ve never done any ‘testing’ before, you can do this. You get straightforward task descriptions, so it’s impossible to go wrong. A task you might expect is something like: Download this game and complete all challenges in 14 days.
  • You can do it from anywhere. I was earning money while taking the bus, chilling on the couch, and during my breaks.
  • Fast cashing out. I had my earnings in my PayPal account in less than 1 day. I’m not sure how long it takes for other withdrawal methods (crypto, gift cards, etc.), but it should be fast as well.
  • You can earn a lot if you’re consistent. I’ve literally seen users in the Leaderboard making $3,000 in just one month. Of course, to get there, you need time, but making a couple of hundred dollars is really easy and relatively fast for anyone.

Don’t miss these PRO tips to earn more:

I feel like most users don’t know about these additional ways to make more money with Freecash:

  • Free promo codes: You can follow Freecash on social media to get weekly promo codes for free coins, which you can later exchange for money.
  • Daily rewards and bonuses: If you use the platform daily, you’ll get additional bonuses that help you earn more.
  • In-app purchases to speed up processes: While playing, you can buy items to help speed up task completion. It’s optional, but it really saved me time, and I earned 4x more than I spent.
  • Choose the highest-paying offers: Check New Offers and Featured Offers to get the best opportunities that pay the most.

Honestly, I still can’t believe I was able to earn this much so easily. And I’ve actually enjoyed the whole process. So, if you’re looking for some truly legit ways to earn money online, Freecash is a very good option.

Profile photo for Quora User

Naming a primary key column id is not bad practice.

Naming all primary key columns id is bad practice.

Not all tables should have a single-column primary key with an auto-incrementing integer.

Some tables have a multi-column primary key. Every many-to-many table, for example.

  1. CREATE TABLE BooksAuthored ( 
  2. book_id INT NOT NULL, 
  3. author_id INT NOT NULL, 
  4. PRIMARY KEY (book_id, author_id) 
  5. ); 

Some tables have a “natural key” instead of an auto-incrementing key.

  1. CREATE TABLE States ( 
  2. state_abbr CHAR(2) PRIMARY KEY, 
  3. state_name VARCHAR(20) NOT NULL 
  4. ); 

Forcing every table to have a superfluous primary key col

Naming a primary key column id is not bad practice.

Naming all primary key columns id is bad practice.

Not all tables should have a single-column primary key with an auto-incrementing integer.

Some tables have a multi-column primary key. Every many-to-many table, for example.

  1. CREATE TABLE BooksAuthored ( 
  2. book_id INT NOT NULL, 
  3. author_id INT NOT NULL, 
  4. PRIMARY KEY (book_id, author_id) 
  5. ); 

Some tables have a “natural key” instead of an auto-incrementing key.

  1. CREATE TABLE States ( 
  2. state_abbr CHAR(2) PRIMARY KEY, 
  3. state_name VARCHAR(20) NOT NULL 
  4. ); 

Forcing every table to have a superfluous primary key column id INT AUTO_INCREMENT even when it doesn’t need it was a habit popularized by “opinionated” frameworks like Ruby on Rails. Their theory was that forcing every table to follow a pattern makes some coding tasks more consistent. Nevertheless, there are cases where forcing that pattern is inappropriate and makes other coding tasks more complex and inefficient.

The bad practice isn’t using id sometimes — it’s the insistence on arbitrary rules even when they’re not helpful.

Profile photo for Greg Moore

Put me down as the almost always NO camp.

The problem with an auto-incremented column as Primary Key is honestly, it’s very easy to destroy database integrity with it and it can make testing a royal pain in the butt.

I’ve often seen people say, “well it’s good when you have no natural key and are just using it as some sort of lookup table.” Yeah, no.

I’ll give you an example loosely based on an issue I encountered in the real world. Client was storing some data, I believe it was colors being used in some sort of lookup table, so I’ll use that as an example.

So table looked something like:

  1. PK Color 

Put me down as the almost always NO camp.

The problem with an auto-incremented column as Primary Key is honestly, it’s very easy to destroy database integrity with it and it can make testing a royal pain in the butt.

I’ve often seen people say, “well it’s good when you have no natural key and are just using it as some sort of lookup table.” Yeah, no.

I’ll give you an example loosely based on an issue I encountered in the real world. Client was storing some data, I believe it was colors being used in some sort of lookup table, so I’ll use that as an example.

So table looked something like:

  1. PK Color Usage 
  2. == ==== ==== 
  3. 1 Red Foreground 
  4. 2 White Background 
  5. 3 Yellow Frame 

So inserts looked something like

  1. Insert Color_LK (Color, Usage) Values ('Red','Foreground') 
  2. Insert Color_LK (Color, Usage) Values ('White','Background') 
  3. Insert Color_LK (Color, Usage) Values ('Yellow','Frame') 

Easy peazy.

But imagine two scenarios:

The first: The inserts get run twice in production. Now you have duplicate rows, but they’re not duplicate because the Primary Key says they’re not. There’s nothing to keep someone from entering the same data multiple times. Or worse different data

  1. Insert Color_LK (Color, Usage) Values ('Blue','Foreground') 

Now which is is the Foreground?

Ah, but Greg, obviously my key is 1 for the foreground so the Blue value (which would be associated with 4 won’t matter).

But, you don’t know that do you? Because perhaps in Dev, the developer did

  1. Insert Color_LK (Color, Usage) Values ('Red','Foreground') 
  2. Insert Color_LK (Color, Usage) Values ('White','Background') 
  3. Insert Color_LK (Color, Usage) Values ('Yellow','Frame') 

But in UAT the UI person insisted that they change the Foreground color to Blue. The developer decides to update their original script, so they remove the first insert and add a new one.

  1. Insert Color_LK (Color, Usage) Values ('White','Background') 
  2. Insert Color_LK (Color, Usage) Values ('Yellow','Frame') 
  3. Insert Color_LK (Color, Usage) Values ('Blue','Foreground') 

The UIX signs off on it and this gets deployed to production.

So now in Dev you have

  1. PK Color Usage 
  2. == ==== ==== 
  3. 1 Red Foreground 
  4. 2 White Background 
  5. 3 Yellow Frame 

But in UAT and Production

  1. PK Color Usage 
  2. == ==== ==== 
  3. 1 White Background 
  4. 2 Yellow Frame 
  5. 3 Blue Foreground 

Will the REAL value for Foreground please stand up.

Now, before you object and say, “that’ll never happen” I can tell you from years of experience I’ve seen this sort of thing happen all the time with lookup tables that use auto incrementing primary keys.

Or I’ve seen, even worse:

  1. PK Color Usage 
  2. == ==== ==== 
  3. 1 Red Foreground 
  4. 2 White Background 
  5. 3 Yellow Frame 
  6. 4 Blue Foreground 

Which has its own problems!

So, the solution:

Personally, I’d probably just make the Usage Column the primary key.

But if for some reason you insist on using integers for your primary key (and there can be arguments for and against that) then don’t use auto-incrementing. Hand code in the value. i.e. Foreground WILL ALWAYS BE 1, Background WILL always be 2., etc.

So your inserts become

  1. Insert Color_LK (PK, Color, Usage) Values (1, 'Red','Foreground') 
  2. Insert Color_LK (PK, Color, Usage) Values (2, 'White','Background') 
  3. Insert Color_LK (PK, Color, Usage) Values (3, 'Yellow','Frame') 

Now you insure that no matter what environment, 1 is always Foreground, etc.

Are you on the right path to retirement? Investors with $1 million+, download this guide.
Profile photo for Art Kagel

The real problem with having the primary key column (assuming that there is indeed a single column primary key) that is always named ‘id’ in every table is that when these keys are referenced in another table the name of the foreign key column in the dependent table will be different than the column that it references and I have a problem with that. I am a firm believer that an object’s name should be consistent in all contexts where possible (a table that has multiple foreign key references to the same table is an obvious “not possible” exception) and that each name should uniquely refer to o

The real problem with having the primary key column (assuming that there is indeed a single column primary key) that is always named ‘id’ in every table is that when these keys are referenced in another table the name of the foreign key column in the dependent table will be different than the column that it references and I have a problem with that. I am a firm believer that an object’s name should be consistent in all contexts where possible (a table that has multiple foreign key references to the same table is an obvious “not possible” exception) and that each name should uniquely refer to only one object.

Call me rigid, but do I think that it is approaching insanity, and certainly provoking of confusion, when every primary key is just called ‘id’. I agree that calling the primary key of the person table person_id is a bit redundant, but that is what you will have to call it when you reference it as a foreign key, so just call it person_id in the person table as well!

I have the same objection to using generic column names like ‘account_num’ in multiple tables if the “account number” referred to in the general_ledger table is not the “account number” in the customer table is not the “account number” in the vendor table. When you reference those columns in other tables, you will either have to prepend the name with some prefix indicating which account number it refers to or there will be confusion because developers, for example, are not interested in parsing foreign key constraints! So I prefer “gl_account_num”, “cust_account_num”, and “vend_account_num” be used in the defining table and all references consistently when I design a database schema.

Profile photo for Karl Jørgensen

“Everyone has a unique name” ??

No they do not.

I have a (weird) namesake in the USA - apparently the guy has a fetish for inflatable reindeer. (no: I’m not making this up…). Has caused me some awkward moments in job interviews…

One good quality of primary keys is that they do not change. Changing the value of a primary key column is (relatively speaking) a lot of work: All references to it must change. And people change names all the time.

Also: Primary keys should not be excessively long. They should only be long “enough” to ensure uniqueness. And names can be long. Just look at this venerable p

“Everyone has a unique name” ??

No they do not.

I have a (weird) namesake in the USA - apparently the guy has a fetish for inflatable reindeer. (no: I’m not making this up…). Has caused me some awkward moments in job interviews…

One good quality of primary keys is that they do not change. Changing the value of a primary key column is (relatively speaking) a lot of work: All references to it must change. And people change names all the time.

Also: Primary keys should not be excessively long. They should only be long “enough” to ensure uniqueness. And names can be long. Just look at this venerable person:

His name is Adolph Blaine Charles David Earl Frederick Gerald Hubert Irvin John Kenneth Lloyd Martin Nero Oliver Paul Quincy Randolph Sherman Thomas Uncas Victor William Xerxes Yancy Zeus Wolfeschlegel­steinhausen­bergerdorff­welche­vor­altern­waren­gewissenhaft­schafers­wessen­schafe­waren­wohl­gepflege­und­sorgfaltigkeit­beschutzen­vor­angreifen­durch­ihr­raubgierig­feinde­welche­vor­altern­zwolfhundert­tausend­jahres­voran­die­erscheinen­von­der­erste­erdemensch­der­raumschiff­genacht­mit­tungstein­und­sieben­iridium­elektrisch­motors­gebrauch­licht­als­sein­ursprung­von­kraft­gestart­sein­lange­fahrt­hinzwischen­sternartig­raum­auf­der­suchen­nachbarschaft­der­stern­welche­gehabt­bewohnbar­planeten­kreise­drehen­sich­und­wohin­der­neue­rasse­von­verstandig­menschlichkeit­konnte­fortpflanzen­und­sich­erfreuen­an­lebenslanglich­freude­und­ruhe­mit­nicht­ein­furcht­vor­angreifen­vor­anderer­intelligent­geschopfs­von­hinzwischen­sternartig­raum Sr. Even wikipedia refused to use the full name in the URL.

Many names can be spelled in numerous ways - e.g. many arabic names have multiple correct spellings in the latin alphabet.

Many people have multiple names. Not as in “first name”, “middle name” - but “different names they go by”. Quite legally.

Are you OK with primary keys in UTF-8? Because how else are you going to write محمد ?

And for some reason, you believe that everybody has a name!?

As a programmer, you must read - and understand Falsehoods Programmers Believe About Names. It should open your eyes.

Smart code completion, on-the-fly analysis, quick-fixes, refactorings that work in SQL files, and more.
Profile photo for Art Kagel

This is the biggest religious debate in the database world. Some advocate that every table MUST use a surrogate key. Others that one MYST always use the natural key. I am more pragmatic myself.

A natural key is, well, natural to the data. I prefer to use the natural key when it makes sense. However when the natural key is a lengthy sequence of bytes, say if it is made up of multiple character columns, it can produce an index that is not ideal to use for detail-to-parent joins. Also using it as a foreign key can make related table rows wider than they might be. In those cases I would definitely

This is the biggest religious debate in the database world. Some advocate that every table MUST use a surrogate key. Others that one MYST always use the natural key. I am more pragmatic myself.

A natural key is, well, natural to the data. I prefer to use the natural key when it makes sense. However when the natural key is a lengthy sequence of bytes, say if it is made up of multiple character columns, it can produce an index that is not ideal to use for detail-to-parent joins. Also using it as a foreign key can make related table rows wider than they might be. In those cases I would definitely go for adding an autoincrementing integer surrogate key as the primary key and make the natural key an alternate unique key.

Profile photo for Alan Mellor

Sometimes you can; if your application enforces unique usernames at sign up, then you might be able to use user name.

I still wouldn’t though. Use a surrogate key (like an integer from a sequence generator) and merely have the username as an indexed column you can look up (or a join).

The main problem with using any value from the real world is that even if you can guarantee uniqueness when the account is created, you cannot guarantee there will be no change.

It is very common for people to change surnames, especially on marriage for example.

Not only can this cause a conflict with an existing pre

Sometimes you can; if your application enforces unique usernames at sign up, then you might be able to use user name.

I still wouldn’t though. Use a surrogate key (like an integer from a sequence generator) and merely have the username as an indexed column you can look up (or a join).

The main problem with using any value from the real world is that even if you can guarantee uniqueness when the account is created, you cannot guarantee there will be no change.

It is very common for people to change surnames, especially on marriage for example.

Not only can this cause a conflict with an existing previously unique name, but it can wreak havoc with key relationships.

Just don’t do it.

Profile photo for Grant Fritchey

“Is it always recommended to have an auto-incremented column as the PRIMARY KEY in a database?”

The key word in the question is “always.”

The short answer is, no. It is not always recommended to have an auto-incremented column as the primary key.

For example, you have an interim table, a table for a many to many join. The natural key is the primary key from each of the other tables. Let’s assume, just for our argument here, that neither of those is a natural key. Instead, each of them is an artificial key, auto-incremented as we’re discussing. Those two columns make the PK on the interim table. Y

“Is it always recommended to have an auto-incremented column as the PRIMARY KEY in a database?”

The key word in the question is “always.”

The short answer is, no. It is not always recommended to have an auto-incremented column as the primary key.

For example, you have an interim table, a table for a many to many join. The natural key is the primary key from each of the other tables. Let’s assume, just for our argument here, that neither of those is a natural key. Instead, each of them is an artificial key, auto-incremented as we’re discussing. Those two columns make the PK on the interim table. You’re suggesting, for no reason, to add a second key, just so that we can “always” have a auto-increment key? No. Makes not sense. Adds overhead, and makes the whole design cumbersome, plus, unless other tables are related to the interim table, that artificial key will never be used for any queries. It exists to do nothing. You still have to enforce the unique keys between the two tables to ensure the many-to-many relationship isn’t broken as data gets removed from the system.

This isn’t even getting into the natural key versus artificial key debate. Myself, I lean hard towards artificial keys are better, for a number of reasons. However, as in the example above, natural keys have a reason to be used. Plus, even if you have an artificial key, you still must enforce the natural key or you’re looking at the potential for bad data.

Profile photo for Peter Zet

I always use auto numbering (auto increment) as a primary key.

This is why I prefer this:

  1. It’s the most compact form, so it saves of data storage and performs better. The change of making mistakes when correcting something manually is also minimalised.
  2. I always use the same name for the primary key field. When I have a file customer the keys name is customer_id. By doing this I never have to look up the fields name (and I already know the field type).
  3. It’s a sequential way of numbering, so lower numbers are always older. This can be convenient when something went wrong from a certain moment etc.

I

I always use auto numbering (auto increment) as a primary key.

This is why I prefer this:

  1. It’s the most compact form, so it saves of data storage and performs better. The change of making mistakes when correcting something manually is also minimalised.
  2. I always use the same name for the primary key field. When I have a file customer the keys name is customer_id. By doing this I never have to look up the fields name (and I already know the field type).
  3. It’s a sequential way of numbering, so lower numbers are always older. This can be convenient when something went wrong from a certain moment etc.

I already use this method many years and never experienced any disadvantages. However there might be rare situations where looking for other solutions is a better idea.

Profile photo for Jack Lion Heart

The main disadvantage to this (that I'm aware of) is that all other indexes will reference the rows by your primary key, so you're increasing the size (and decreasing the block density) of all of your other indexes.

The other thing you want to be careful with is that it's generally preferable to write your rows in sequential/increasing primary key order -- and having lots of independent components to your primary key increases the chances that you're not doing that.

If either of these issues sounds like it may apply, you should consider adding an "id" column to your table that serves as your pri

The main disadvantage to this (that I'm aware of) is that all other indexes will reference the rows by your primary key, so you're increasing the size (and decreasing the block density) of all of your other indexes.

The other thing you want to be careful with is that it's generally preferable to write your rows in sequential/increasing primary key order -- and having lots of independent components to your primary key increases the chances that you're not doing that.

If either of these issues sounds like it may apply, you should consider adding an "id" column to your table that serves as your primary key, and adding a separate UNIQUE KEY index on what would have otherwise been your primary key.

However, if neither of these sounds like a concern, I think it's fine.

There's some more discussion here:
http://www.mysqlperformanceblog.com/2006/10/03/long-primary-key-for-innodb-tables/

Profile photo for Joe Celko

No, just the opposite. Auto incrementing is how you avoid the relational model. You are mimicking a magnetic tape file! Did you ever read a book on RDBMS? Definition of the key is that it is a subset of attributes such that it is unique and not null for every row in a table. By absolute textbook definition, an auto increment can never be a key because it can never be a column. What attribute does it model? None! It has to do with physical storage, not a logical model.

This is also why we don't like to say primary key anymore. This declaration is a leftover from when SQL was built on magnetic ta

No, just the opposite. Auto incrementing is how you avoid the relational model. You are mimicking a magnetic tape file! Did you ever read a book on RDBMS? Definition of the key is that it is a subset of attributes such that it is unique and not null for every row in a table. By absolute textbook definition, an auto increment can never be a key because it can never be a column. What attribute does it model? None! It has to do with physical storage, not a logical model.

This is also why we don't like to say primary key anymore. This declaration is a leftover from when SQL was built on magnetic tape files and disk storage systems, not logical models.

Profile photo for Quora User

Use a UUID when your primary key values are generated in a decentralized way. In other words, if multiple apps are creating data and assigning identifying keys, without any ability for the apps to sync with each other to avoid using the same primary key value.

Example: data is created on users's clients (browsers and mobile devices) before posting the data back to a website. They are doing this concurrently, and have no way of knowing how many other clients are doing the same thing. So if they each have their own idea of the current primary key value, and just trying to use the next higher valu

Use a UUID when your primary key values are generated in a decentralized way. In other words, if multiple apps are creating data and assigning identifying keys, without any ability for the apps to sync with each other to avoid using the same primary key value.

Example: data is created on users's clients (browsers and mobile devices) before posting the data back to a website. They are doing this concurrently, and have no way of knowing how many other clients are doing the same thing. So if they each have their own idea of the current primary key value, and just trying to use the next higher value, they're sure to conflict.

Using a UUID makes the chances of conflict near zero, because each client is generating a new random UUID in a sufficiently large domain (128 bits) that the chances of two clients generating the same value is insignificant (at least for the scale of data we work on today, in the early 21st century).

The downsides include:

  • 128 bits is larger than 32 bits used for traditional integers, so the data requires more space. UUID's are often stored as strings of hex digits, so they take even more space.
  • The bulk of primary keys is repeated as other tables contain foreign keys referencing the UUID.
  • Inserting new data into a table randomly, instead of appending to the end, may be inefficient, depending on the implementation of the DBMS.
  • UUID's take longer to type, so doing ad hoc queries or mocking up test data during development is a PITA.
Profile photo for Grant Fritchey

Why?

I might make sense, but I need to understand why someone thinks it’s needed. Let’s take a different tack. You can add an IDENTITY property to a column. You can also have a SEQUENCE property on a column. You can have both. Or two SEQUENCE columns. Or three. Why? Why would you need that. One example might be, you want to expose a number for the people using the app, but, you, appropriately, don’t want the artificial primary key value to suddenly get meaning. So, instead, you add the SEQUENCE. Now there are two different counters, one with meaning, and one without.

Without the why, I can’t tel

Why?

I might make sense, but I need to understand why someone thinks it’s needed. Let’s take a different tack. You can add an IDENTITY property to a column. You can also have a SEQUENCE property on a column. You can have both. Or two SEQUENCE columns. Or three. Why? Why would you need that. One example might be, you want to expose a number for the people using the app, but, you, appropriately, don’t want the artificial primary key value to suddenly get meaning. So, instead, you add the SEQUENCE. Now there are two different counters, one with meaning, and one without.

Without the why, I can’t tell you whether or not it makes sense.

Asked to answer, but I really didn’t. Sorry.

Profile photo for Thomas Barkman

Absolutely, yes! You can set a primary key on character columns. Imagine you've got a table of your favorite books. The title of the book, a character column, could serve as your primary key. Every book has a unique title, right? Well, it's not purely black and white though.

While using primary keys on character columns is indeed possible, it's not always the best pick. There may be performance issues when dealing with large datasets. See, text comparison takes more time than integer comparison.

Also, remember the off chance that two books might have the same title? That's a bummer, mate! You've

Absolutely, yes! You can set a primary key on character columns. Imagine you've got a table of your favorite books. The title of the book, a character column, could serve as your primary key. Every book has a unique title, right? Well, it's not purely black and white though.

While using primary keys on character columns is indeed possible, it's not always the best pick. There may be performance issues when dealing with large datasets. See, text comparison takes more time than integer comparison.

Also, remember the off chance that two books might have the same title? That's a bummer, mate! You've got yourself a duplicate key situation. That's why in many scenarios, using a unique identifier like an integer can be a safer bet.

But hey, don't let that get you down. If you're sure that your character values are distinct, go ahead and set that primary key. Just be mindful of the potential hiccups. Get creative with your databases, and keep the questions coming - you're learning heaps!

Your response is private
Was this worth your time?
This helps us sort answers on the page.
Absolutely not
Definitely yes
Profile photo for Art Kagel

No skilled database designer will EVER “ just <use> those columns as part of your primary key”!

One might create an index that started with the low cardinality column (say ‘gender’ or ‘marital_status’) that includes the primary key column(s) in addition, but that would be a separate index from the primary key constraint and its supporting index. The primary key, but its very definition, is the column or set of columns that uniquely identifies a record in a table. Adding an attribute that has nothing to do with identifying a row breaks that purpose.

Now, I would say that having an index solely on

No skilled database designer will EVER “ just <use> those columns as part of your primary key”!

One might create an index that started with the low cardinality column (say ‘gender’ or ‘marital_status’) that includes the primary key column(s) in addition, but that would be a separate index from the primary key constraint and its supporting index. The primary key, but its very definition, is the column or set of columns that uniquely identifies a record in a table. Adding an attribute that has nothing to do with identifying a row breaks that purpose.

Now, I would say that having an index solely on a low cardinality column is not necessarily a good thing, it is also not necessarily a bad thing either. Some systems can apply multiple indexes to filter a search or support a join and in those systems having several single column indexes, even on low cardinality columns, can improve performance of those searches and joins while also reducing storage consumption and improving insert, update, and delete performance as a bonus. IBM’s Informix, for example, can use multiple indexes for searches and also for data warehouse style queries in a Star or Snowflake schema designed database and such queries will perform better than the equivalent designs using compound key indexes instead.

Profile photo for Hugo Kornelis (he/him)

(Asked to answer)

First: Never concatenate columns. It's a violation of First Normal Form and it well cause huge pains later.

Second: A PRIMARY KEY (or in fact any candidate key) does not have to be single column. If it takes three columns to reach uniqueness, then you define a key on those three columns combined. This is called a composite key.

Third: There are extremely few cases where the data does not have what I call a “business key”, the data used in the business to distinguish between elements in the group. Just ask people in the business why they use to identify an individual customer, pr

(Asked to answer)

First: Never concatenate columns. It's a violation of First Normal Form and it well cause huge pains later.

Second: A PRIMARY KEY (or in fact any candidate key) does not have to be single column. If it takes three columns to reach uniqueness, then you define a key on those three columns combined. This is called a composite key.

Third: There are extremely few cases where the data does not have what I call a “business key”, the data used in the business to distinguish between elements in the group. Just ask people in the business why they use to identify an individual customer, product, contact, etc. and you will find a good primary key candidate.

Fourth: There are cases where it is appropriate to create a surrogate key (unique values in a single column generated by the computer). Having a composite PRIMARY KEY can be one of those reasons. But it's called surrogate for a reason: it should be added to the table in addition to the business key and both should be declared as keys to enforce their uniqueness. The surrogate keys can then be used in other tables to implement foreign key relationships; these values are preferably not used externally. Surrogate keys are typically used to improve join performance or save space but at the cost of needing more joins for some reports.

So in short: never concatenate columns; find the business key; add a surrogate key when needed but not to replace the business key.

Profile photo for Andrew Droffner

Each table row has exactly one primary key. A primary key column is both unique and not null. This primary key is used to identify a row uniquely.

Other columns may be unique to prevent input errors. Also, the UNIQUE constraint creates an index in the background which may make queries faster.


Sample Table

  1. CREATE PLAYER ( 
  2. ID INTEGER PRIMARY KEY, 
  3. FIRST VARCHAR(10), 
  4. LAST VARCHAR(100) NOT NULL, 
  5. JERSEY_NUMBER INTEGER UNIQUE NOT NULL 
  6. ); 

PRIMARK KEY Example

The SELECT below should return only one row because it matches the PRIMARY KEY column.

  1. -- ID IS PRIMARY KEY 
  2. SELECT * FROM PLAYER 
  3. WHERE ID = 1; 

UNIQUE Examp

Each table row has exactly one primary key. A primary key column is both unique and not null. This primary key is used to identify a row uniquely.

Other columns may be unique to prevent input errors. Also, the UNIQUE constraint creates an index in the background which may make queries faster.


Sample Table

  1. CREATE PLAYER ( 
  2. ID INTEGER PRIMARY KEY, 
  3. FIRST VARCHAR(10), 
  4. LAST VARCHAR(100) NOT NULL, 
  5. JERSEY_NUMBER INTEGER UNIQUE NOT NULL 
  6. ); 

PRIMARK KEY Example

The SELECT below should return only one row because it matches the PRIMARY KEY column.

  1. -- ID IS PRIMARY KEY 
  2. SELECT * FROM PLAYER 
  3. WHERE ID = 1; 

UNIQUE Example

The new player row is inserted into the database. The player’s jersey number cannot be the same as anyone else’s, and the UNIQUE constraint makes sure this is true.

  1. INSERT INTO PLAYER 
  2. (ID, FIRST, LAST, JERSEY_NUMBER) 
  3. VALUES (2, 'Jane', 'Seymour', 12); 
Profile photo for Don Humberson

Fair warning this is a rather long answer to a relatively short question.

This answer will be confined to the design of tables for relational databases. I am going to restate your question a bit in the hope that what I'm reading into it is the actual question you were asking. If it's not please tell me where it's not and I'll try to focus my answer differently.

I would restate your question as “Why should we use surrogate keys instead of natural keys when designing tables for database applications?”

Think of relational design as a stepwise process. An early part of that process is identifying the

Fair warning this is a rather long answer to a relatively short question.

This answer will be confined to the design of tables for relational databases. I am going to restate your question a bit in the hope that what I'm reading into it is the actual question you were asking. If it's not please tell me where it's not and I'll try to focus my answer differently.

I would restate your question as “Why should we use surrogate keys instead of natural keys when designing tables for database applications?”

Think of relational design as a stepwise process. An early part of that process is identifying the business entities that go into a specific data domain, the attributes naturally contained within each entity, and how we are going to uniquely identify each distinct set of values contained within a given entity(each such set of values is called a tuple). Notice that I have not used the terms table, column, or row? That is not simply being fussy, it is done to emphasize that we are dealing with a set of abstractions from business processes that we will be turning into database tables and their relations.

If we're being careful about this first level of design we will only identify attributes within each entity which are meaningful within the context of that entity. Simple examples would be things like personal name or family name in a person entity, company legal name or company doing business as name in a vendor entity, or main body color in an automobile inventory entity. We will keep identifying such attributes until we have captured all of the attributes that our business partners think of as part of a given entity, and then we may add additional attributes that we realize are an implicit part of the way the business people view this particular entity. One of the ways that we know we have captured enough information about a given entity is when the business partner is able to identify every tuple in that entity uniquely. Since we have only captured meaningful information about the entity in these attributes the set of attributes which uniquely identify the tuple must in fact be attributes which are meaningful in the business context.

Designers call these meaningful identifiers natural keys. If a single attribute uniquely identifies the tuple we call that a single part natural key, if it takes more than one we call that a multiple part or multi-part natural key. Whatever number of parts are needed, the set of attributes needed to identify that tuple constitute primary key of that entity. By definition this primary key is the natural key.

Everything we have done so far falls into the domain of something called logical data design. There are a large number of rules that help a database designer construct a robust design which follows best practices for relational database design. Once this logical design is built, reviewed, and validated it is time to turn to the the next step which is called physical design. The physical design will in turn be used to produce the actual code which will create the real database in whatever brand of relational database the client desires. There are a very large number of additional decisions needed in this step, but a crucial part of making the leap from logical to physical correctly is to ensure that all of the information captured in the logical model is correctly reflected into the physical.

For the specific answer to why use surrogate keys I'm not going to drive into the process of normalization or show the complexities that can occur when multipart natural keys are propagated across several generations of children in identifying relations. Suffice it to say that one additional driver towards surrogation is the size to which a grandchild or great-grandchild's natural key can grow.

This is where the process of surrogation may occur. Under some circumstances the designer may choose to replace a natural key with a simple numerical sequence, often just an integer that is incremented once for each row inserted in a given table. Such an integer still uniquely identifies each row, but it carries no information whatsoever about the actual values of the columns that were derived from logical attributes. That is where the term surrogate key comes from, it is a legitimate primary key which stands in for, or is a surrogate for, the natural key of that table.

Typical reasons for surrogation include replacing multiple long strings in a natural key with a much smaller and more easily manipulated single value, replacing the natural key with a value that is guaranteed to always be one step larger than any previous value, and replacing a natural key whose business processes may allow value changes over time with a value which will remain fixed for the entire lifespan of the given row. Such surrogate keys may also produce much more compact indexes, especially where the database software uses copies of the key as a way to enhance index usability. There is an entire additional discipline of physical design called dimensional database modeling which virtually demands surrogate keys for the tables it calls dimensions. Please be aware that surrogate keys do also have drawbacks.

I hope going into this level of detail has been helpful. If I get the chance I will expand this answer with a few web references that can go far more deeply into the specific pieces that I have summarized above.

Profile photo for Curtis Ruck

UUID's are valuable because they are not sequential. In a distributed database, or even a normal relational database, generating incrementing numbers typically creates a LOCK, in that only one insert can generate a number at any one time. Various RDBMS's have mechanisms (preallocation) to get around this limitation, but then in most implementations you loose assumed the one-upness of the value.

With UUID's you can generate the UUID inside the database or within the application. Due to the quantity of available of UUID's you can safely assume that your generated UUID is unique, and quickly st

UUID's are valuable because they are not sequential. In a distributed database, or even a normal relational database, generating incrementing numbers typically creates a LOCK, in that only one insert can generate a number at any one time. Various RDBMS's have mechanisms (preallocation) to get around this limitation, but then in most implementations you loose assumed the one-upness of the value.

With UUID's you can generate the UUID inside the database or within the application. Due to the quantity of available of UUID's you can safely assume that your generated UUID is unique, and quickly store it, and use it inside other records within the same transaction.

Lastly, i've found that optimizing UUID's at the database level involves storing them in in byte form instead of as strings. This means the UUID takes 16 bytes to store, instead of 32 (without hyphens) or 36 bytes (with hyphens). When these UUID's in byte form are indexed, the index is at least 50% smaller, which at high record counts, shaves some time off of a database fetch due to less B-Tree blocks being required to store the index, and a shorter tree height.

Profile photo for Martin O'Shea

For an actual database table, given its placement and relationship(s) with other table(s) in the database, use of one or more columns for a primary key, to uniquely identify each row in the table, is absolutely essential. This uniqueness is one of the basic concepts of the relational database paradigm known as entity integrity.

The only occasion when I would not use primary keys is for data transfer, i.e. (1) when data is being either imported from a flat file into a database table, or (2) data is being exported from a database table into a flat file. In both cases the database table(s) used wo

For an actual database table, given its placement and relationship(s) with other table(s) in the database, use of one or more columns for a primary key, to uniquely identify each row in the table, is absolutely essential. This uniqueness is one of the basic concepts of the relational database paradigm known as entity integrity.

The only occasion when I would not use primary keys is for data transfer, i.e. (1) when data is being either imported from a flat file into a database table, or (2) data is being exported from a database table into a flat file. In both cases the database table(s) used would have pre-defined primary key(s), but the flat files would not, even if they are temporarily resident in the database.

Profile photo for Greg Kemnitz

In addition to the schema-design-based relational DB reasons to have a PK…

You’d rarely not use a PK, particularly in MySQL InnoDB. If you don’t have a PK in InnoDB, it will auto-generate one for you and store it in the table and in any indexes based on the table. As this auto-generated PK is composed from all the column values in your row, it can take a lot of space in base table storage and in index storage.

Some situations where you may not need a PK:

  • Work tables and temporary tables. These are often stored using the MEMORY storage engine and usually don’t have secondary indexes. You *can* hav

In addition to the schema-design-based relational DB reasons to have a PK…

You’d rarely not use a PK, particularly in MySQL InnoDB. If you don’t have a PK in InnoDB, it will auto-generate one for you and store it in the table and in any indexes based on the table. As this auto-generated PK is composed from all the column values in your row, it can take a lot of space in base table storage and in index storage.

Some situations where you may not need a PK:

  • Work tables and temporary tables. These are often stored using the MEMORY storage engine and usually don’t have secondary indexes. You *can* have a PK on these, but you usually do because you’re actively using the work table PK for some purpose such as guaranteeing uniqueness, etc.
  • Related to above, you are loading a CSV file from disk or doing some other big load or import, and you will build the final table using two or more load tables. You often load to intermediate tables that may be quite large, but that you’ll DROP after you’re done with the import. Note that is also the only situation where I may use the MyISAM engine in MySQL.
Profile photo for Mike

It’s been along time since I worked on databases and I’ll answer as best I can. I assume you’re working with a well normalized database. We had several tables that had more than one column with unique values. Some of these values were simpler than others - i.e. a unique number such as a social security number vs a street address. It was was easier and more efficient to use the unique number. Other times we had many tables that had information for people. Each person had a unique id and it was easier to write table join sql statements where these tables joined on that unique id.

Profile photo for Quora User

Yes, an auto-incrementing identity column as the primary key is quite appropriate for most uses

However, if you want to use the ID as a token in a URL, but you don’t want anybody to be able to do unexpected — or unauthorized — things, like changing the numerical ID in the URL to try and access a resource incorrectly, then using a UUID would be very appropriate.

Hope that helps!

Profile photo for Andrew Clarke

This is because all the RDBMS use a paging system to prevent it. This paging system was developed long before RDBMSs, because it was essential for the early computers that had very little memory. At its simplest, data is kept in blocks or pages on the disk and when the application asks for a block of data, it is returned a pointer to the block in memory. The application has a finite number of blocks and if the block is not in memory the least -recently-used (LRU) memory block is released, and the memory block is reallocated to read in the requested page of data on disk. As far as the applicati

This is because all the RDBMS use a paging system to prevent it. This paging system was developed long before RDBMSs, because it was essential for the early computers that had very little memory. At its simplest, data is kept in blocks or pages on the disk and when the application asks for a block of data, it is returned a pointer to the block in memory. The application has a finite number of blocks and if the block is not in memory the least -recently-used (LRU) memory block is released, and the memory block is reallocated to read in the requested page of data on disk. As far as the application/database is concerned it has the resources to read the data, but under the covers, it is all being swapped around between disk and memory to cope. Naturally, it isn’t a free lunch. Other threads may be trying to do the same thing, and other threads lose all caching. Hogging the resources is always possible but is bad news in a multi-user system, just like eating all the pie in the canteen. You must only request just the data you actually need in any multi-user database, even when it doesn’t seem to max out the disk. It sure slows everything down.

Profile photo for Arthur Fuller

In MySQL, a sequence is a list of integers generated in ascending order i.e., 1,2,3…. Most often, sequences are used to create unique identifiers, but if that’s what you’re after, relax. MySQL can generate one for you automatically. You use the AUTO_INCREMENT attribute when specifying the column in your CREATE TABLE command:

CREATE TABLE employees
( emp_no INT AUTO_INCREMENT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50)

);

A table can have only one AUTO_INCREMENT column.

Sometimes you need to know the last number generated. To find this, use the LAST_INSERT_ID() function.

Profile photo for Yinso Chen

UUIDs are great for creating globally unique identifiers that you can count on for basically zero collision [1]. That implies the following:

  • You can create them in different databases with confidence that you can merge the table without collisions
  • You can create anywhere in your app rather than relying on the database to create them (so you don't need to insert first and then retrieve the IDs)
  • Because they are globally unique - you can consider them as truly immutable IDs for the records (usually by convention, since most databases don't provide write-once columns [2]).

They are a great candidate

UUIDs are great for creating globally unique identifiers that you can count on for basically zero collision [1]. That implies the following:

  • You can create them in different databases with confidence that you can merge the table without collisions
  • You can create anywhere in your app rather than relying on the database to create them (so you don't need to insert first and then retrieve the IDs)
  • Because they are globally unique - you can consider them as truly immutable IDs for the records (usually by convention, since most databases don't provide write-once columns [2]).

They are a great candidate for unique keys if there are no natural keys for the data in question.

UUIDs are not without downsides, of course:

  • 32-digit numbers are cumbersome to remember and use, especially if your customers are going to see them - for those records it's better to have an alternative ID format.
  • not all databases support storing UUIDs in bytes so it can take up more space. This is in practice not a big deal since most databases have ample disk space, and UUIDs are seldom the main culprit for space consumption.


The biggest problem though is using UUIDs with clustered indexes. Clustered indexes orders the underlying physical table, so if a new UUID is inserted out of order, it will causes the table to be resorted, which can have big performance impacts for large tables [3]. If I recall correctly, MS SQL Server (and possibly MySQL) use clustered indexes for primary keys unless you specify otherwise.

My usual practice is to use int/bigint auto increment for primary keys, and a separate unique key with UUID, so that way I don't have to remember which database defaults to clustered primary keys.

---
1 - The quality of the generation depends on the random number generator - for v4 uuids - see
Universally unique identifier.

2 - This can be enforced programmatically in systems that support triggers.

3 - Note this is not specific to UUIDs; any data type that doesn't get inserted sequentially will cause the same problem, so it's better to think of this as a property of clustered indexes instead - use them with care.

Profile photo for Alan Groves

I once had to create a table with the merged contents from two other tables. I wanted to keep the original keys, so the new table would continue to join to other tables. The problem was, both tables used auto numbers (identity columns) with the same range.

In the end, i negated one of the keys so the combined key remained unique. Some trickery using absolute values was needed to perform joins. SQL Server always performs better with numbers, so this worked well.

Profile photo for Greg Kemnitz

Actually, it depends on what you're doing with the table. In most situations, a fat primary key is a Bad Idea, especially in MySQL InnoDB, where the primary key is stored as the lookup key in secondary indexes.

A "wide" primary key (whether based on a single long column value or multiple columns) will make secondary indexes larger and more expensive, causing inefficient use of buffer pool, slower inserts and other operations, and general bad performance if you have a lot of secondary indexes.

That said, a carefully-chosen multi-column primary key *can* be a very good thing for specific types o

Actually, it depends on what you're doing with the table. In most situations, a fat primary key is a Bad Idea, especially in MySQL InnoDB, where the primary key is stored as the lookup key in secondary indexes.

A "wide" primary key (whether based on a single long column value or multiple columns) will make secondary indexes larger and more expensive, causing inefficient use of buffer pool, slower inserts and other operations, and general bad performance if you have a lot of secondary indexes.

That said, a carefully-chosen multi-column primary key *can* be a very good thing for specific types of use-cases. In InnoDB, the primary key effectively specifies the ordering and clustering in the B-Tree used to store the row. If you have a multi-column PK, the first column will be used as the major ordering, the second column as the secondary ordering within the major ordering, and so on.

An example of where this can be useful is if you have a set of User records with a largish set of properties that are loaded to your app every time a User logs in (a fairly common situation in online games, blog sites, etc). A table structured something like this

  1. create table UserProperties ( 
  2. UserID bigint, 
  3. PropertyTypeID bigint, 
  4. PropertyValue varchar(256), 
  5. primary key (UserID, PropertyTypeID, PropertyValue) 
  6. ) engine=InnoDB; 

would work nicely, as all the properties of a given User will be clustered adjacently in InnoDB storage, guaranteeing that a User property load will happen quickly, and also likely guaranteeing that Property updates for an active User will be on pages loaded in the buffer pool.

If you need to do a lot of property type searches that span Users, this structure would be less good as you'll need an expensive secondary index, but other solutions may be useful as per your application requirements.

Profile photo for Joachim Pense

There seem to be different schools. Those with a database theory background often say that artificial IDs are evil, and only natural keys are allowed, because they represent reality. Many practitioners say that natural keys are evil, because they are rarely unique, are often more volatile than expected, and can be long and bulky, too.

For data warehouses, natural keys are a no-no, generated surrogate keys should be used, even for things like dates. Reason: the bulkiness has a high impact with the huge fact tables involved, and as the data are stored for very long, stability of the keys is even

There seem to be different schools. Those with a database theory background often say that artificial IDs are evil, and only natural keys are allowed, because they represent reality. Many practitioners say that natural keys are evil, because they are rarely unique, are often more volatile than expected, and can be long and bulky, too.

For data warehouses, natural keys are a no-no, generated surrogate keys should be used, even for things like dates. Reason: the bulkiness has a high impact with the huge fact tables involved, and as the data are stored for very long, stability of the keys is even more of a requirement.

About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025