Comments: |
ML? Scheme? Any higher-level functional language can do global analysis of this type. I'm studying compiler theory now, and "assignment" makes static analysis a royal pain in the ***.
Sounds a lot like the speculative load functions used by ia64(Itanium) and other modern experimental processores. Wherein, the load from memory instruction is split into two parts, a load from memory part, and a commit to register part. Although the main reason of doing that has to do with branching.
It seems like if you where in a language with first-class functions and closures, you could have your function foo() preload the data out of the DB in the background, and return a function that does the actual work. If you where to call the returned function before the preload was completed it'd just block until the data was avaliable. This keeps you from having local data copied back and forth between a bunch of functions... so your code would look something like,
foo()();
for a single call to foo, and
t1 = foo(); t2 = bar(); t1(); t2();
for calls to both foo and bar. The advantages here are somewhat readable code, it being fully re-entrent, and it doesn't have to pass local variable around because they sit in the returned functions. I vaugly recall doing something like this in LISP. I think you could also do this in java by returning an anonymous inner class, but you'd be inflicting great pain and lossage on yourself moving to java, tomcat and company.
Alternatly, you could just fork off two threads, one to execute foo, and another to execute bar.
Lastly, in the realm of pure fantasy, you could add an instruction re-ordering pass to the Perl bytecode compiler, it could inline foo() and bar() and then reorder the OPS in such a way that the DB requests where up top and the work down below.
It seems that in all these cases, your using threads and you'd need to worry about the overhead of syncronization and multiple threads.
![[User Picture]](https://l-userpic.livejournal.com/54541970/2) | From: brad 2004-02-06 11:30 am (UTC)
Re: Musings | (Link)
|
It seems like if you where in a language with first-class functions and closures, you could have your function foo() preload the data out of the DB in the background, and return a function that does the actual work.
Perl does have closures, btw.
If you where to call the returned function before the preload was completed it'd just block until the data was avaliable.
That's totally the .NET async model without the verbose .BeginInvoke/.EndInvoke and all the IAsyncResult stuff. Thread-hungry, though.
But I don't want to just do it in the background, I want to merge the data loading with other data loads. (though threads would at least kill the latency problem, I suppose, but it's not as clean)
Inlining everything doesn't solve it, because we don't know until runtime what the execution path will be, and what we'd need to preload.
I want to merge the data loading with other data loads.
Forgive me I haven't done much relational database work. But, if your making two unrelated requests or nearly unrelated requests, whats the advantage of merging them. It was my understanding that once you opened a connection to the DB it remained open for the duration of the perl session unless explicitly closed. And if its similar DB requests across different Perl sessions (i.e. page loads) I thought that was what memcached was for.
Alternatly ... and here's where it gets sticky. You could have foo() and bar() return functions as above... But instead of having foo() and bar() make the request... they add their requests to a queue. And then the first call to one of the returned functions hits the DB for all of the requests. ex:
t1 = foo(); # adds foo's DB work to the queue t2 = bar(); # adds bar's DB work to the queue t1(); # hits the DB for both requests t2(); # doesn't hit the DB because it already has the results from the above call
this elimnates all the bytecode level hacking, and keeps you from having to maintain meta-data on your functions (which could be a pida). And of course you could still do
foo()();
and have it do the right thing.
![[User Picture]](https://l-userpic.livejournal.com/54541970/2) | From: brad 2004-02-06 04:37 pm (UTC)
Re: Musings | (Link)
|
But, if your making two unrelated requests or nearly unrelated requests, whats the advantage of merging them.
Network round-trips.
Instead of:
SEND: I want A. RECV: I got A. SEND: I want B. RECV: I got B.
It'd be:
SEND: I want A and B. RECV: Get A and B.
Of course, we're upgrading our whole internal network from 100 Mbps to gigabit now, so the round-trip cost will be a lot less, so I'll probably just ignore the whole issue.
From: evan 2004-02-06 11:44 am (UTC)
laziness | (Link)
|
Haskell is the only popular language I know that has real lazy data everywhere, but I imagine it falls under the category of "academic wanking". I've seen lazyiness in ML done like this (converted to Perl for you): $x = lazy sub { 2 + 2 }; $x->force; # actually runs the sub and returns 4 $x->force; # returns 4 again; doesn't run the sub.I'm sure you could implement the function "lazy" trivially. I'd be surprised if such a perl module didn't exist already. Languages that do this internally "just" put the "lazy" before every variable , and "force" after every reference. (It's much harder than that, of course, if you want it to be at all performant.) But this sort of laziness isn't the kind you need, because it isn't network-aware.
![[User Picture]](https://l-userpic.livejournal.com/54541970/2) | From: brad 2004-02-06 12:01 pm (UTC)
Re: laziness | (Link)
|
The "Memomize" module does that, kinda. It caches the return values of subs so they're not evaled multiple times with the same args.
Really I just want to make a function:
string HTTP_response (string HTTP_request) { ... return force .... }
And everything it calls inside that is lazy. At the very end "return", the language starts figuring out a plan to evaluate everything underneith, including merging subs together. Say functions foo() and bar() both call memcached_load(@args). If the execution tree calls both foo() and bar() after each other, and memcached_load(@arg) has some attribute function which maps:
memcached_load(@args) MERGE WITH memcached_load(@args) -> memcached_load(@args), the compiler could join 'em.
Hell, I'm rushing this because everybody here wants to go eat.
![[User Picture]](https://l-userpic.livejournal.com/59032042/661708) | From: denshi 2004-02-06 02:51 pm (UTC)
Re: laziness | (Link)
|
If you could encapsulate the lazy area like so, maybe you could hand it off to an embedded lazy language, like Haskell (the runtime's not that large and some are written in C). But Haskell doesn't have the parallelism... Have you looked at Erlang?
For ease of implementation, I think gen_witt may be on the right track.
I once designed an entire language around lazy evaluation. As in, you essentially make each variable a... generated function call, and evaluating it calls that function. It's fairly clever, and you can do some really amazing stuff with it, but the interpreter never got far enough along to be incredibly useful. But it was a good idea, and I still hope to implement it some day. The cool thing is when you get into lazy evaluation of things which generate lists on the fly... It's really just a function, so it's a bit like doing a functional programming language without variables, all functions, and you have insane call graphs, but you only evaluate as used, so you write your whole program around your results, but even more so than most functional people, because they still do line by line, whereas in this semi-vaporware language of mine, you really wrote from the result backwards, since that's how it ran, effectively. Heh.
Like this? Sure, it's not automatic.. but I guess you can't get much easier than that.
![[User Picture]](https://l-userpic.livejournal.com/54541970/2) | From: brad 2004-02-07 08:53 pm (UTC)
Re: | (Link)
|
That's neat, for forking, but I don't want to fork. I just want to skip some unnecessary work by knowing what'll come later in different paths and do it all at once ahead of time, with less latency.
Does it matter if it's asynchronous or not?
![[User Picture]](https://l-userpic.livejournal.com/54541970/2) | From: brad 2004-02-07 09:02 pm (UTC)
Re: | (Link)
|
No threads, no forking.
Let me be a little more concrete. Read the example I gave again with some more context...
The data fetching across the network is memcached. The memcached execution time is near-instant. The overhead is the packet round-trip times. If we can fetch two items at once instead of fetching them one-at-a-time, we speed things up.
Forking and threading are too heavy here. We're talking about saving tiny amounts of time. But in aggregate, it'd be nice. | |