My questions is regarding Database Modeling. I tried to look for this question amongst other Database Designing questions on SO but haven't found it and so here am asking about:

What are the general guidelines and best practices to keep in mind while designing database for an application ?

What are the best resources/books/University Lectures available on Database Design Concepts ?



Just some things I've learned from experience (I'm sure some will disagree, but I've been querying and designing and programming databases for 30+years and have seen the effects of stupid design up close and personal):

There are three critical things to consider in all database design - data integrity (without this you essentially have no data), security and performance. All other considerations take a back seat to these three.

Never create a table without a way to uniquely identify a record.

There really are very few true natural keys that really work as a primary key, if you don't have control over whether it will change, do not use it as a primary key (you don't really want to change the company name through 27 child tables do you?). Use a surrogate key instead. Using a surrogate key does not exempt you from the need to set unique indexes if you could have used a unique composite key. Always set these indexes if you can determine a way to have a unique composite. Duplicate records are the bane of an application's existance. It seems obvious but never ever consider name to be a key field, names are not and never will be unique.

Do not use a GUID as your primary key as it can kill performance. If you need a guid for replication also consider having an int or big int primary key.

Do not design as if you will be changing database backends unless you know up front you will be doing so. Virtually all the good techniques for performance tuning are database specific, don't harm your own ability to tune your database for a non-existant requirement.

Avoid value-entity table structures. They are miserable to query.

Add all things you need to ensure data integrity into your database design, things like defaults, constraints, triggers, etc. are necessary to avoid having useless data. Do not rely on the application code to do this or you will be sorry.

Others mentioned normalization, I agree you must understand this thoroughly even if you later decide to denormalize.

Do not stack views on top of views if you want any kind of performance at all. Every database I've seen that does this is eventually a huge performance problem.

Consider what data you will need to manage the database as well as what the application needs. If you are going to be serious about databases you need to understand database auditing and your database should implement ways to find out who made what change and when and what the old data was. You'll thank me the first time someone malicious changes the data or someone deletes all the records in a table accidentally.

Really think through how the data will be queried when designing. It can make a huge difference in the design.

Do not store more than one piece of information in a field. It might look cool to put a comma delimited list into one field rather than add a related table but it is a really bad idea.

Elegance is often the enemy of performance in databases. Pick performance over elegance every time and you won't go wrong.

Avoid the use of database keywords in the naming of objects. Your programmers will thank you. Pick a naming convention and be consistent in always using it. If a field is in mulitple tables make sure it is the same name (exception if an id field has two possible foreign keys in the same table use the id field name and a prefix to identify the differnce between say Sales_person_id and Customer_person_id), same datatype and length, if applicable in all of them. Fix misspellings right away, you really don't want to spend the next ten years remembering that in tablea it is the persnoid instead of personid.

Read about database refactoring (search on amazon for some good books) and consider how to be able to do this in your design. Few databases are designed to be refactored and being able to do so is critical towards being able to fix database problems that arise from badly thought out designs or changes to business requirements.

While you are reading, read about performance tuning, you'll learn a tremendous amount about what to avoid in designing the database.

I'm sure there's more but this is enough to start with.

One addtional thing I wanted to add. Do not design your database as if the data entry application page is the most critical thing. Data is often queried more often than it is written even in a transactional database. Really think about how easy it will be to to get data back out of the database (Oh so that's why the EAV model is so bad!) and what effect the design will have on reporting. This is espcially critical as I often see that the people doing the reporting are not the people who design the database or reporting tasks are later in the project than createing the data entry. Databases are not easy to refactor, consider the whole life cycle of the data when designing a database. Think about things like storing moment in time values as you can't find out how much an order was for two years later by multiplying the quantity ordered by the price in the products table as that wasn't the price at the time of the order. Reporting needs this type if information, but it often too late to get it by the time the reports are written when the design is done badly. Stuff that works fine when you are handling one record at a time can be a disaster when you need to look at thousands or millions of records. Not every application is going to create a separate reporting datbase, so really think about this.



this question is like saying "what is the best car to buy", it really depends on many factors including amount of data, number of concurrent users, what you are trying to do, etc. FYI, normalization is good for some database uses, but bad for others (data warehouse).

Give us a better idea of how you intend to use the data, and you'll get some better recommendations.


While I agree with others that your question right now is much too broad and can't really be answered (except for the "it depends" approach :-)), there is one book I would wholeheartedly recommend for anyone beginning database design in general:

Michael Hernandez: Database Design for Mere Mortals(R): A Hands-On Guide to Relational Database Design

alt text http://ecx.images-amazon.com/images/I/61XMDo4BFkL._BO2,204,203,200_PIsitb-sticker-arrow-click,TopRight,35,-76_AA240_SH20_OU01_.jpg

It's a really hands-on, no-frills, down to earth book and introduces all the major and important concepts in a very understandable, very approachable fashion. Well written, interesting, very sound and useful - highly recommended!



your question is too broad. Normalization and denormalization are most used concepts.


The best thing to do is to start with a well normalized database. The wikipedia article has some good information on that along with some good references.

Typically you'll end up denormalizing parts of your database for better performance, but you almost always want to start with it in 4th normal form.


Look at wikipedia article about database normalization. There is also further reading section.

If you design a new database for brand new application you should try use ORM library (like JPA implementations in Java) that release you from database design, because these tools generate database from domain model. If you don't have any experience in this field - database generated with ORM tools will be much better of yours.


Consider all your use cases. Think about every single possible way someone might want to get to data, and plan for those. Wear your designer, developer, tester, and user hats.

Try to think of database tables as representing physical objects.

Normalize, as others have said.