?

Log in

No account? Create an account
M:N DBI proxy, x86, InnoDB... - brad's life — LiveJournal [entries|archive|friends|userinfo]
Brad Fitzpatrick

[ website | bradfitz.com ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

M:N DBI proxy, x86, InnoDB... [May. 9th, 2003|04:24 pm]
Brad Fitzpatrick
[Tags|, ]

I want a M:N DBI- or MySQL-level proxy, where M max threads from client processes are serialized over N max real db connections. (especially useful for the master db in a tree of replicated slaves, where the master db connections are largely idle, the slaves doing all the selecting....) Or maybe even a thread pool on the client machines themselves. I wander if anybody's done that with mod_perl2 and Apache2 yet. (is the DBI library threadsafe?)

x86 sucks. (well, for accessing lots of memory) There's a patch (in -aa) to change the default kernel/user split to 0.5:3.5 GB, but the InnoDB manual states glibc is still buggy and the stack and heap can collide if you allocate more than 2GB. But perhaps the glibc they statically link is aware of mapping changes with the -aa patch? I'd hope.

So say I have a machine with 4GB. I can safely have the database address 2GB. Factor into 500+ connections, taking 2.25 MB each. Remove 20MB for Inno's dictionary pool and 30 MB for misc MyISAM table indices which can't run on Inno... Down to 738 MB for InnoDB's data/key cache. Out of 4GB. Pathetic. See why I want the DBI proxy and more address space?

Sure, the rest of the memory will be used somewhat-usefully by the kernel's buffer cache, but not ideally. The database could utilize it better.

Unfortunately I don't have a spare machine with 4GB of memory to tinker with. I have three with 2 GB in the closet, but they all have motherboards with 2 GB limits.

Actually, I guess I do have a spare machine, since LJ just got two new databases... Waiting for lisa to get them online, then I can play.
LinkReply

Comments:
(Deleted comment)
[User Picture]From: brad
2003-05-09 06:01 pm (UTC)
If only that effect were more common ... :-)
(Reply) (Parent) (Thread)
[User Picture]From: octal
2003-05-09 05:50 pm (UTC)
I've got a quad-xeon with 4GB ram running debian unstable (it belongs to some friends, and is at their house with a routed IP, but it's not doing anything specific right now, and I'm sure they'd be willing to help out), if you need a test platform.

(Reply) (Thread)
[User Picture]From: brad
2003-05-09 06:10 pm (UTC)
I'd rather not screw it up. (and I would... the whole test is how hard can I push it until it falls over and dies... :-))

Our new machines should be up shortly, then I'll play there.

Thanks, though.
(Reply) (Parent) (Thread)
[User Picture]From: taral
2003-05-09 07:58 pm (UTC)
This is why I prefer PPC for real work.
(Reply) (Thread)
From: evan
2003-05-09 10:32 pm (UTC)
What sort of high-end (ie database-quality) PPC machines are there?
(Reply) (Parent) (Thread)
[User Picture]From: taral
2003-05-10 12:04 am (UTC)
Depends on your price range. IBM pSeries comes to mind, but then again, I work with them daily. :)
(Reply) (Parent) (Thread)
From: jeffr
2003-05-10 01:32 am (UTC)
It seems to me that Power4 would be more appropriate given the extremely large L3 cache that you can get on them. They are prohibitively expensive though.

You could also do PAE or PSE36 x86 (64GB of memory max). Although that doesn't help a single process. FreeBSD supports this well now but I dont know much about linux. We also routinely change our kernel/user split. Although mostly I go the other way because most of the time I'm speeding things up by moving them into the kernel.

Having extra memory in your buffer cache is not likely to help for a well written database. Most of the time they do direct io and use their own buffer management. Although I don't really know about mysql etc.

I think your best bet would be to try the AMD 64bit processor. Lots of address space and a very familiar/compatible environment. It's also very fast, which many traditional 64bit risc processors are not.

Alpha has excellent integer performance and memory bandwidth as well as very large caches. It seems to me that these would be the most important elements for a fast database. Unfortunately alpha is very expensive and also EOL. :-( So sad.

The M:N thing doesn't sounds so difficult though eh? I imagine it is a solved problem although admittedly I know little about real database implementations and issues.
(Reply) (Parent) (Thread)
[User Picture]From: brad
2003-05-10 03:31 pm (UTC)
Having extra memory in your buffer cache is not likely to help for a well written database. Most of the time they do direct io and use their own buffer management. Although I don't really know about mysql etc.

MySQL has pluggable table handlers. The InnoDB handler is awesome and does its own buffer management (and can do unbuffered I/O to raw partitions). The default handler (MyISAM) only caches its indexes, letting the OS cache data buffers.

The M:N thing doesn't sounds so difficult though eh? I imagine it is a solved problem although admittedly I know little about real database implementations and issues.

There are 3 or 4 different options out there. They all have some notable problems, though, so I just have to find the one that's closest and commit some time to either fixing it or finding acceptable work-arounds.
(Reply) (Parent) (Thread)
From: jeffr
2003-05-10 01:38 am (UTC)
Statically link the binary. I just reread your post and realized what part of the issue is likely to be.

Shared libraries are handled by mmap()ing them into the process's address space and then resolving the symbols. mmap typically finds a spot somewhere in the middle of the user's address space and places the data there. After that it's pretty much a first fit algorithm in most operating systems. It's quite a bit more lame than you might expect.

Anyway, due to wacky placement of shared libraries you're given far greater space for stack than you're likely to need. Also, I wonder if the glibc malloc uses mmap internally or sbrk. Many allocators are going to mmap now so that they may return pages to the system when they no longer need them.

I actually did some work for another application which wanted to use up the full 2gig of user va that was supplied to it. It was much more difficult than it should be. Statically linking the binaries is likely to take care of one possible issue though. When you statically link them the text and data is placed with the text and data of the actual application and not in the middle of the address space somewhere. You can also produce more effecient code this way if the resulting text is small due to greater spacial locality which improves cache performance and results in shorter jumps.
(Reply) (Thread)
[User Picture]From: brad
2003-05-10 03:21 pm (UTC)
The binary is statically linked. I'm wondering if the warning in their docs is actually for people who haven't statically linked it.

I bought a support contract (they deserve it, anyway) so I'll write them some emails and see.
(Reply) (Parent) (Thread)
[User Picture]From: edm
2003-05-10 02:20 pm (UTC)

Database proxy

Have you looked at either:

DBI::Proxy and DBI::ProxyServer (in CPAN)

or:

SQLRelay (http://sqlrelay.sourceforge.net/)

I don't think either is exactly what you want as it stands, although both appear to work in a network-mode somewhat like Apache::DBI (maintain a pool of already open connections, and then match them up to incoming
requests).

Depending on your exact load profile it might be that you can stand the TCP/IP open overhead of not using persistent connections on the Apache/mod_perl side (providing your network latency is low), so long as you don't have to pay the database setup/authentication overhead.

If not, I suspect that DBI::Proxy/DBI::ProxyServer would be reasonable amenable to shoehorning in a M->N multiplexer, as it's probably got enough state to tell you when a connection is "hot" (ie, in use for a transaction) and when it's not (ie, sitting idle waiting for another request to come along).

I was also going to suggest looking at:

DB Balancer (http://dbbalancer.sourceforge.net/)

(the Debian package is maintained by a friend of mine), but I see that is PostgreSQL only at present, so won't directly help you.

Completely off the top of my head I'd guess that a good perl/network/database hacker ought to be able to build you a M->N database proxy in the order of a week or two of work, at least within your likely constraints (eg, almost all connections have the same database access privileges, and you could arrange for the few exceptions to connect some other way than through the proxy).
(Reply) (Thread)
[User Picture]From: brad
2003-05-10 03:27 pm (UTC)

Re: Database proxy

As I replied to you before you got an LJ account, SQLRelay looks great. Unfortunately, it looks like it'll take some work to remove the DBD::MySQL-isms from the code and make some helper functions to detect the $dbh sub-type (DBD::mysql vs DBD::sqlrelay) and do the right thing.

An SQLRelay proxy that looked to clients just like a MySQL server would be ideal for me, but the way they've done it is definitely better.

I'll also have to modify their server's client driver to support auto-commit it looks like.

MySQL provides a way to limit max connections. Each one takes up (thread_stack + record_buffer + sort_buffer). With their binaries, the thread_stack is 256k... quite acceptable. The record_buffer and sort_buffer are 1MB each, though.

I'm wondering if there's an option to limit max *running* threads, so instead of:

connections * (thread_stack + record_buffer + sort_buffer)

I'd have:

connections * thread_stack + max_running * (record_buffer + sort_buffer)

Then I wouldn't need a relay at all.

Anyway, things to look into.

I'll keep writing about it, since I know people are reading... :-)
(Reply) (Parent) (Thread)
[User Picture]From: chick
2003-05-10 03:04 pm (UTC)
No offence, but you look like that guy from N'Sync...very cute, the difference is that you have the looks and the brains.
(Reply) (Thread)
[User Picture]From: krow
2003-05-10 04:58 pm (UTC)
Ignore Heiki on this one, he doesn't have a box to truly test this with (Monty has one, but he doesn't typically test this).
All of Slashdot's DBs are 4gig machines and I have had the pool sitting right at 2gigs by itself at one point. Allocating more then 2gigs of memory really is not an issue.
(Reply) (Thread)