问题描述:

Imagine you have a table POSTS with 50 million posts. This table has:

ID, POST, CATEGORY_ID

1, "Hello world", 2

And then you have a CATEGORIES table with:

ID, CATEGORY_NAME

1, "Football"

2, "Baseball"

On the website, you list these posts in ORDER DESC, showing the post and the category name using a join.

[Baseball]

Hello World!

I'm thinking on adding (when someone create a new post) the CATEGORY_NAME column to the POSTS table in order to avoid having to make that JOIN everytime I get a visit.

In order of performance, is a good idea? Imagine you have 10.000.000 visitors daily (hopefully :))

On the other hand, I would like to know what DB engine you consider better for it, mysql? mariadb? mongodb?

EDIT:

Imagine a more real scenario: I need 3 joins to show POST + CATEGORY + USER_NAME.

POSTS: 50 million rows

CATEGORIES: 100 rows

USERS: 10 million rows

Option A) Join between the 3 tables to show:

[Baseball]

Hello World

- By John

Option B) Adding 2 columns (CAT_NAME, USER_NAME) varchar(25) to POSTS table in order to avoid the join.

网友答案:

Keep things normalized.

The cost of the JOIN is less than the cost caused by having a plumper table. Currently (small amount of data) you cannot see much performance difference. When the tables are too big to cache in RAM, the performance benefit of normalization will shine.

相关阅读:
Top