Unix Technical Forum

Planning for Scalability & Replication

This is a discussion on Planning for Scalability & Replication within the MySQL forums, part of the Database Server Software category; --> I'm currently developing a social networking site that may, hopefully, in the future need to scale out to multiple, ...


Go Back   Unix Technical Forum > Database Server Software > MySQL

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 01:05 AM
Steve_C
 
Posts: n/a
Default Planning for Scalability & Replication

I'm currently developing a social networking site that may, hopefully, in
the future need to scale out to multiple, maybe even many, database servers.
Will it happen? Who knows, but I'd like to be prepared for this now, so
adding additional servers will be quick and somewhat painless. Even if it
never happens, at least I'll learn something new.

So, this is what I'm thinking....

a database for basic user info
a database for profiles
a database for messages
and so forth

If the site gets to the point where it needs to have additional servers,
there will be the option of having the databases on their own servers which
could each serve as masters in the future for replication if needed. The
database of basic user info would be replicated across all servers so the
info can be used for joins (to get usernames when querying messages for
example), since cross database joins require the databases to be on the same
server.

Since multiple database servers and replication is new to me, I thought I'd
get some input from you guys here. Does this sound like a good way to go?
Or, is there a better way to do this? I am aware that I could just put
everything into one database and replicate it out to many servers, but I
figured that since some databases will be used far more often than others,
it would be better to separate them out so that the most used databases
could possibly get more slaves than the less used ones.

Thanks in advance for any input, suggestions, etc.

Steve



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-15-2008, 08:32 PM
Jerry Stuckle
 
Posts: n/a
Default Re: Planning for Scalability & Replication

Steve_C wrote:
> I'm currently developing a social networking site that may, hopefully, in
> the future need to scale out to multiple, maybe even many, database servers.
> Will it happen? Who knows, but I'd like to be prepared for this now, so
> adding additional servers will be quick and somewhat painless. Even if it
> never happens, at least I'll learn something new.
>
> So, this is what I'm thinking....
>
> a database for basic user info
> a database for profiles
> a database for messages
> and so forth
>


Why? That will make virtually everything unmanageable.

> If the site gets to the point where it needs to have additional servers,
> there will be the option of having the databases on their own servers which
> could each serve as masters in the future for replication if needed. The
> database of basic user info would be replicated across all servers so the
> info can be used for joins (to get usernames when querying messages for
> example), since cross database joins require the databases to be on the same
> server.
>


And what happens if you need to do cross-database joins? i.e al the
messages for a particular user (from basic user info and messages)?

> Since multiple database servers and replication is new to me, I thought I'd
> get some input from you guys here. Does this sound like a good way to go?
> Or, is there a better way to do this? I am aware that I could just put
> everything into one database and replicate it out to many servers, but I
> figured that since some databases will be used far more often than others,
> it would be better to separate them out so that the most used databases
> could possibly get more slaves than the less used ones.
>
> Thanks in advance for any input, suggestions, etc.
>
> Steve
>


Not at all. If you want to go this route, you need to hire someone
who's been there before. It's not something you should try to do by
yourself if you've never done it before.

But if you insist (and I know you will) if the data are related, they
should all be in one database. Have a master and multiple slaves.
Update the master, request data from the slaves.

Or, better yet - just put one database on a huge server by itself, and
have everyone do it from there.

But chances are about 1 in a million that you'll EVER get that much
traffic. Much more probable is that you'll be able to do everything on
shared hosting and won't even need a single dedicated server. There are
thousands of "social networking" sites out there. There are maybe a
dozen who are doing much more than paying their hosting costs.


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-15-2008, 08:32 PM
Steve_C
 
Posts: n/a
Default Re: Planning for Scalability & Replication


"Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
news:VeCdnZCuIdODi53VnZ2dnUVZ_remnZ2d@comcast.com. ..
> Steve_C wrote:
>> I'm currently developing a social networking site that may, hopefully, in
>> the future need to scale out to multiple, maybe even many, database
>> servers. Will it happen? Who knows, but I'd like to be prepared for this
>> now, so adding additional servers will be quick and somewhat painless.
>> Even if it never happens, at least I'll learn something new.
>>
>> So, this is what I'm thinking....
>>
>> a database for basic user info
>> a database for profiles
>> a database for messages
>> and so forth
>>

>
> Why? That will make virtually everything unmanageable.
>


Why do you think this would be unmanageable? It's called vertical
partitioning and it's pretty common.


>> If the site gets to the point where it needs to have additional servers,
>> there will be the option of having the databases on their own servers
>> which could each serve as masters in the future for replication if
>> needed. The database of basic user info would be replicated across all
>> servers so the info can be used for joins (to get usernames when querying
>> messages for example), since cross database joins require the databases
>> to be on the same server.
>>

>
> And what happens if you need to do cross-database joins? i.e al the
> messages for a particular user (from basic user info and messages)?
>


That issue is already addressed above. Every server has a replicated copy of
the basic user info (the information that all databases will need) database
so cross-database joins is a non-issue.

>> Since multiple database servers and replication is new to me, I thought
>> I'd get some input from you guys here. Does this sound like a good way
>> to go? Or, is there a better way to do this? I am aware that I could
>> just put everything into one database and replicate it out to many
>> servers, but I figured that since some databases will be used far more
>> often than others, it would be better to separate them out so that the
>> most used databases could possibly get more slaves than the less used
>> ones.
>>
>> Thanks in advance for any input, suggestions, etc.
>>
>> Steve
>>

>
> Not at all. If you want to go this route, you need to hire someone who's
> been there before. It's not something you should try to do by yourself if
> you've never done it before.
>
> But if you insist (and I know you will) if the data are related, they
> should all be in one database. Have a master and multiple slaves. Update
> the master, request data from the slaves.


A single database on a master would eventually max out (much, much faster
than having the databases separated), which is why partitioning comes into
play. I'm just thinking that we should start out partitioning immediately,
to allow for quick and painless scaling as needed.

>
> Or, better yet - just put one database on a huge server by itself, and
> have everyone do it from there.
>


....which can get very, very expensive. We are looking to use many low-cost,
64-bit servers that can be added as needed rather than shelling out a huge
sum of money for one big server.

> But chances are about 1 in a million that you'll EVER get that much
> traffic. Much more probable is that you'll be able to do everything on
> shared hosting and won't even need a single dedicated server. There are
> thousands of "social networking" sites out there. There are maybe a
> dozen who are doing much more than paying their hosting costs.


Yes, there are thousands of social networking sites and almost all of them
are severly lacking. The overwhelming majority of them have absolutely no
marketing whatsoever, either. We aren't trying to create the next MySpace,
Friendster, etc. here. It's actually significantly different. However, once
the site is launched we will receive some free publicity on a few major
cable networks as well as several national publications (it pays to know the
right people and, even better, be related to the right people). So, the
potential for rapid growth is much higher than with ordinary circumstances.
We've just been saving our connections for the right project.

While your suggestions may work for most situations (and would probably work
in ours), the big server idea is probably too costly and all tables in a
single database with a master and slaves would be outgrown much sooner than
what I had originally suggested. Vertical partitioning, as I suggested,
divides up the workload (both reads and writes) much more than with a single
database and allows for much more scaling. Partitioning out an existing
single database when its limits are reached would be much more difficult and
time consuming than if the database was partitioned from the get-go. The
weak link is the horizontal replication of the user information across all
servers. However, if we reach the point (doubtful, but who knows) where
vertical partitioning is stumbling, we should be making more than enough
money to cover the costs of a solution.

I was just wondering if anybody had any other suggestions (preferrably
someone who at the very least knows what partitioning is).


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-15-2008, 08:32 PM
Jerry Stuckle
 
Posts: n/a
Default Re: Planning for Scalability & Replication

Steve_C wrote:
> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
> news:VeCdnZCuIdODi53VnZ2dnUVZ_remnZ2d@comcast.com. ..
>> Steve_C wrote:
>>> I'm currently developing a social networking site that may, hopefully, in
>>> the future need to scale out to multiple, maybe even many, database
>>> servers. Will it happen? Who knows, but I'd like to be prepared for this
>>> now, so adding additional servers will be quick and somewhat painless.
>>> Even if it never happens, at least I'll learn something new.
>>>
>>> So, this is what I'm thinking....
>>>
>>> a database for basic user info
>>> a database for profiles
>>> a database for messages
>>> and so forth
>>>

>> Why? That will make virtually everything unmanageable.
>>

>
> Why do you think this would be unmanageable? It's called vertical
> partitioning and it's pretty common.
>


Yes, it's an attempt to implement vertical partitioning, but you're
going about it the wrong way.

For instance - how are you going to join tables in completely different
databases, running on completely different servers? How are you going
to update multiple databases on different servers - especially if one
goes down? The list will be endless. You're asking for a huge number
of problems.

>
>>> If the site gets to the point where it needs to have additional servers,
>>> there will be the option of having the databases on their own servers
>>> which could each serve as masters in the future for replication if
>>> needed. The database of basic user info would be replicated across all
>>> servers so the info can be used for joins (to get usernames when querying
>>> messages for example), since cross database joins require the databases
>>> to be on the same server.
>>>

>> And what happens if you need to do cross-database joins? i.e al the
>> messages for a particular user (from basic user info and messages)?
>>

>
> That issue is already addressed above. Every server has a replicated copy of
> the basic user info (the information that all databases will need) database
> so cross-database joins is a non-issue.
>


OK, and how are you going to keep them in sync? Replication takes time.
And trying to have multiple masters, each handling their own database
(but the databases are related) will soon run into problems.

>>> Since multiple database servers and replication is new to me, I thought
>>> I'd get some input from you guys here. Does this sound like a good way
>>> to go? Or, is there a better way to do this? I am aware that I could
>>> just put everything into one database and replicate it out to many
>>> servers, but I figured that since some databases will be used far more
>>> often than others, it would be better to separate them out so that the
>>> most used databases could possibly get more slaves than the less used
>>> ones.
>>>
>>> Thanks in advance for any input, suggestions, etc.
>>>
>>> Steve
>>>

>> Not at all. If you want to go this route, you need to hire someone who's
>> been there before. It's not something you should try to do by yourself if
>> you've never done it before.
>>
>> But if you insist (and I know you will) if the data are related, they
>> should all be in one database. Have a master and multiple slaves. Update
>> the master, request data from the slaves.

>
> A single database on a master would eventually max out (much, much faster
> than having the databases separated), which is why partitioning comes into
> play. I'm just thinking that we should start out partitioning immediately,
> to allow for quick and painless scaling as needed.
>


And where do you expect to max out? I've seen databases doing 5K
queries/second.

The first rule of optimizing is - DON'T TO IT PREMATURELY!


>> Or, better yet - just put one database on a huge server by itself, and
>> have everyone do it from there.
>>

>
> ...which can get very, very expensive. We are looking to use many low-cost,
> 64-bit servers that can be added as needed rather than shelling out a huge
> sum of money for one big server.
>


Much less expensive than buying and maintaining multiple servers.

>> But chances are about 1 in a million that you'll EVER get that much
>> traffic. Much more probable is that you'll be able to do everything on
>> shared hosting and won't even need a single dedicated server. There are
>> thousands of "social networking" sites out there. There are maybe a
>> dozen who are doing much more than paying their hosting costs.

>
> Yes, there are thousands of social networking sites and almost all of them
> are severly lacking. The overwhelming majority of them have absolutely no
> marketing whatsoever, either. We aren't trying to create the next MySpace,
> Friendster, etc. here. It's actually significantly different. However, once
> the site is launched we will receive some free publicity on a few major
> cable networks as well as several national publications (it pays to know the
> right people and, even better, be related to the right people). So, the
> potential for rapid growth is much higher than with ordinary circumstances.
> We've just been saving our connections for the right project.
>


OK, so you get a little advertising from cable networks. That will give
you a 2-3 day boost in traffic. After that about 98% of users will
never come back.

> While your suggestions may work for most situations (and would probably work
> in ours), the big server idea is probably too costly and all tables in a
> single database with a master and slaves would be outgrown much sooner than
> what I had originally suggested. Vertical partitioning, as I suggested,
> divides up the workload (both reads and writes) much more than with a single
> database and allows for much more scaling. Partitioning out an existing
> single database when its limits are reached would be much more difficult and
> time consuming than if the database was partitioned from the get-go. The
> weak link is the horizontal replication of the user information across all
> servers. However, if we reach the point (doubtful, but who knows) where
> vertical partitioning is stumbling, we should be making more than enough
> money to cover the costs of a solution.
>


And sorry, everyone who comes up with a "new idea" says exactly the same
thing you just did. Not to throw cold water on your idea, but I wish I
had $1.00 for every one like this. I'd be retired and living in the
Caribbean.

> I was just wondering if anybody had any other suggestions (preferrably
> someone who at the very least knows what partitioning is).
>
>
>


I know what partitioning is. I've been doing RDBMS's for over 20 years,
starting on IBM mainframes. And I know how NOT to implement it.

You asked. I told you why you shouldn't do what you want. But I see
you're going to do it anyway. But that's OK. It's your money and your
customers you will lose.


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-15-2008, 08:32 PM
Michael Austin
 
Posts: n/a
Default Re: Planning for Scalability & Replication

Jerry Stuckle wrote:
> Steve_C wrote:
>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>> news:VeCdnZCuIdODi53VnZ2dnUVZ_remnZ2d@comcast.com. ..
>>> Steve_C wrote:
>>>> I'm currently developing a social networking site that may,
>>>> hopefully, in the future need to scale out to multiple, maybe even
>>>> many, database servers. Will it happen? Who knows, but I'd like to
>>>> be prepared for this now, so adding additional servers will be quick
>>>> and somewhat painless. Even if it never happens, at least I'll learn
>>>> something new.
>>>>
>>>> So, this is what I'm thinking....
>>>>
>>>> a database for basic user info
>>>> a database for profiles
>>>> a database for messages
>>>> and so forth
>>>>


>>> Why? That will make virtually everything unmanageable.


I agree with Jerry on this - been there done that, then did it right.

>>>

<snip>

I would say that the OP is trying to solve a problem he may never have.

What it sounds like he wants (and may never need) is multi-site,
mutli-server, bi-directional replication. The absolute most difficult
paradigm to do correctly.

Now if the database schema's are on the same server - then it is
relatively easy.
see http://forums.mysql.com/read.php?61,3063,4004#msg-4004

If you are thinking that your hardware will be cheap - think again. You
are not going to pull this off with a desktop from your local PC store -
You can start there, but you will be buying "bigger" iron when you have
performance issues (30sec-1min return rates), but by that time, you will
start losing people as they will not wait that long.

Storage will be your next challenge - and being able to configure
correctly and efficiently MYSQL to be multiFileSystem aware. Again, you
are not going to be able to use those "NFS" <n>TB "servers" from your
local PC shop. You will need REAL NAS or SAN storage and contrary to
popular belief - that stuff ain't cheap. Hmmmm.. actually that depends
on what you consider "cheap" or "inexpensive".

Michael Austin
Systems Design, Systems Integrator, DBA (design and production support),
Systems and SAN Admin on MANY different vendor hardware and operating
systems.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-15-2008, 08:32 PM
Jerry Stuckle
 
Posts: n/a
Default Re: Planning for Scalability & Replication

Michael Austin wrote:
> Jerry Stuckle wrote:
>> Steve_C wrote:
>>> "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message
>>> news:VeCdnZCuIdODi53VnZ2dnUVZ_remnZ2d@comcast.com. ..
>>>> Steve_C wrote:
>>>>> I'm currently developing a social networking site that may,
>>>>> hopefully, in the future need to scale out to multiple, maybe even
>>>>> many, database servers. Will it happen? Who knows, but I'd like to
>>>>> be prepared for this now, so adding additional servers will be
>>>>> quick and somewhat painless. Even if it never happens, at least
>>>>> I'll learn something new.
>>>>>
>>>>> So, this is what I'm thinking....
>>>>>
>>>>> a database for basic user info
>>>>> a database for profiles
>>>>> a database for messages
>>>>> and so forth
>>>>>

>
>>>> Why? That will make virtually everything unmanageable.

>
> I agree with Jerry on this - been there done that, then did it right.
>
>>>>

> <snip>
>
> I would say that the OP is trying to solve a problem he may never have.
>
> What it sounds like he wants (and may never need) is multi-site,
> mutli-server, bi-directional replication. The absolute most difficult
> paradigm to do correctly.
>
> Now if the database schema's are on the same server - then it is
> relatively easy.
> see http://forums.mysql.com/read.php?61,3063,4004#msg-4004
>
> If you are thinking that your hardware will be cheap - think again. You
> are not going to pull this off with a desktop from your local PC store -
> You can start there, but you will be buying "bigger" iron when you have
> performance issues (30sec-1min return rates), but by that time, you will
> start losing people as they will not wait that long.
>
> Storage will be your next challenge - and being able to configure
> correctly and efficiently MYSQL to be multiFileSystem aware. Again, you
> are not going to be able to use those "NFS" <n>TB "servers" from your
> local PC shop. You will need REAL NAS or SAN storage and contrary to
> popular belief - that stuff ain't cheap. Hmmmm.. actually that depends
> on what you consider "cheap" or "inexpensive".
>
> Michael Austin
> Systems Design, Systems Integrator, DBA (design and production support),
> Systems and SAN Admin on MANY different vendor hardware and operating
> systems.
>


Yep, I agree on all points.

And I should add - if he has the type of response he thinks he's going
to have, he should be brining in plenty of money. Since he seems so
confident with it, he should invest in the necessary hardware and
software NOW, even if it means taking out a loan. Then it will be in
place when he needs it.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-15-2008, 08:32 PM
Michael Austin
 
Posts: n/a
Default Re: Planning for Scalability & Replication

Steve_C wrote:
> I'm currently developing a social networking site that may, hopefully, in
> the future need to scale out to multiple, maybe even many, database servers.
> Will it happen? Who knows, but I'd like to be prepared for this now, so
> adding additional servers will be quick and somewhat painless. Even if it
> never happens, at least I'll learn something new.
>
> So, this is what I'm thinking....
>
> a database for basic user info
> a database for profiles
> a database for messages
> and so forth
>
> If the site gets to the point where it needs to have additional servers,
> there will be the option of having the databases on their own servers which
> could each serve as masters in the future for replication if needed. The
> database of basic user info would be replicated across all servers so the
> info can be used for joins (to get usernames when querying messages for
> example), since cross database joins require the databases to be on the same
> server.
>
> Since multiple database servers and replication is new to me, I thought I'd
> get some input from you guys here. Does this sound like a good way to go?
> Or, is there a better way to do this? I am aware that I could just put
> everything into one database and replicate it out to many servers, but I
> figured that since some databases will be used far more often than others,
> it would be better to separate them out so that the most used databases
> could possibly get more slaves than the less used ones.
>
> Thanks in advance for any input, suggestions, etc.
>
> Steve
>
>
>



Are stock options available?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 12:20 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com