问题描述:

I'm designing a program where the user makes a single choice on thousands (or potentially millions) of people. I've thought of 2 ways of storing this in a database:

1) a separate row for each entry

2) a single long text that just appends a choice for a new person or modifies a choice for an existing person.

I'd imagine separate rows for each entry should be more efficient, but if we're talking about, let's say, hundreds of thousands of entries, then what is the network overhead I'm looking at for queries on that versus just returning a single long text and using the user's cpu to parse the text?

As an example, a single long text might be something like:

Data

[Person A:Choice A][Person B: Choice A][Person C: Choice C]...[Person n:Choice n]

Whereas multiple rows obviously would be:

Person Choice

A A

B A

C C

....

n n

Maybe I'm not thinking of this in the right way in the first place. Is there a more efficient way of doing something like this?

Thanks for your input.

网友答案:

I'll put my comments in an answer and expand in places.

Regarding your decision of string vs Table. Table every time.

A design based on table Person (Id, Name) , table Choice (Id, Value) and table PersonChoice (Id, PersonId, ChoiceId). Will give you an indexable, searchable and flexible solution.

Hiding data in text columns in SQL is a very bad idea - obviously ignoring XML data and its datatype. But that doesn't apply here.

One solution for adding statistics at a later date could be have scheduled SQL Agent jobs running off the data parsing what and when changes were made and storing that data in separate "reporting" tables.

Something to consider in your design - to save yourself having to store and manipulate 1000s of rows - is the idea of grouping choices together. Could save you a great deal of work (both for yourself and the server).

Welcome to the world of database design!

网友答案:

I'd imagine separate rows for each entry should be more efficient, but if we're talking about, let's say, hundreds of thousands of entries, then what is the network overhead I'm looking at for queries on that versus just returning a single long text and using the user's cpu to parse the text?

Network overhead (as a difference between the two designs) is negligible. Essentially all you're sending to the server is your query, and all the server returns is the result set. If the result of your query is one row, the server returns only one row. If the result of your query is 10,000 rows, the server sends back 10,000 rows.

The real overhead is in execution speed on the server and in maintenance. The server will find indexed rows in a table quickly if you use an index. But finding the 17 in a single value like "1, 2, 3, 5, 6, 7, 8, 10, 13, 17, 18, 30, 27" probably won't use an index.

Values like that also lose type safety and the ability to use foreign key references and cascades.

相关阅读:
Top