Time |
S |
Nick |
Message |
00:26 |
|
|
mezod_ joined #rest |
00:50 |
|
|
vanHoesel joined #rest |
01:07 |
|
|
shrink0r joined #rest |
01:13 |
|
|
warehouse13 joined #rest |
01:16 |
|
|
begriffs joined #rest |
01:52 |
|
|
mezod joined #rest |
02:51 |
|
|
lemur joined #rest |
03:09 |
|
|
ewalti joined #rest |
03:39 |
|
|
fumanchu_ joined #rest |
04:07 |
|
|
mgomezch joined #rest |
04:23 |
|
|
ewalti joined #rest |
04:45 |
|
|
_ollie joined #rest |
05:03 |
|
|
fumanchu joined #rest |
05:05 |
|
|
mgomezch_ joined #rest |
05:09 |
|
|
jaawerth_ joined #rest |
05:09 |
|
|
zama_ joined #rest |
05:50 |
|
|
shrink0r joined #rest |
07:28 |
|
|
ewalti joined #rest |
07:57 |
|
|
dEPy joined #rest |
08:28 |
|
|
vanHoesel joined #rest |
08:41 |
|
|
vanHoesel joined #rest |
08:48 |
|
|
azer_ joined #rest |
09:29 |
|
|
ewalti joined #rest |
09:52 |
|
|
shrink0r joined #rest |
09:56 |
|
|
vanHoesel joined #rest |
10:09 |
|
|
Left_Turn joined #rest |
11:29 |
|
|
azr_ joined #rest |
11:37 |
|
|
Left_Turn joined #rest |
11:43 |
|
pdurbin |
whartung: I miss having you in a channel I log where I can pick your brain about EJB. |
11:44 |
|
trygvis |
EJB? people still use that? :) |
11:44 |
|
pdurbin |
sigh. what do you use trygvis? |
11:45 |
|
trygvis |
often spring and spring-mvc |
11:46 |
|
pdurbin |
so you would never use something like javax.ejb.TransactionAttributeType.REQUIRES_NEW |
11:47 |
|
trygvis |
we do, but it's all handled by spring |
11:48 |
|
trygvis |
but we rarely do that, it usually leads to clusterfucks |
11:48 |
|
pdurbin |
hmm. well, starting to use that reduced the time a method took from 3.5 *hours* to 15 *minutes* |
11:48 |
|
pdurbin |
here's the commit: https://github.com/IQSS/dataverse/commit/a1d9da4 |
11:49 |
|
pdurbin |
I also had to put the method I was calling (indexAll) into a new separate bean so there would be an EJB boundary so the @TransactionAttribute(REQUIRES_NEW) annotation actually has an effect |
11:49 |
|
pdurbin |
this is all very strange and mysterious and magical and spooky to me |
11:49 |
|
pdurbin |
and whartung usually has good insight on this stuff :) |
11:51 |
|
trygvis |
to me the main difference between spring and spec ejb is that you get to move faster, and you don't try to stick to a spec that'll only give you a need for workarounds |
11:51 |
|
trygvis |
on the spring side you can end up debugging more stuff, but for us it is worth it |
11:52 |
|
trygvis |
we use jpa annotations, but we know we're using hibernate so we sneak in some hibernateisms once in a while |
11:52 |
|
|
mezod joined #rest |
11:53 |
|
pdurbin |
fair enough, but you haven't heard of this problem? the idea is that the method is building up a huge single transaction and gets slower and slower as it runs. and the fix is to introduce an EJB boundary and annotated the methods called by the main method with "force a new transaction" |
11:54 |
|
pdurbin |
indexAll calls indexDataset over and over, for example. so you annotate indexDataset with the "force new transaction" magic |
11:55 |
|
trygvis |
usually you want to build big transactions, up to a certain size to precent excessive disk flushing |
11:56 |
|
trygvis |
indexAll might sound like something that hits the "up to a certain size" limit |
11:56 |
|
pdurbin |
huh. well in this case the big single transaction seemed to be killing performance |
11:57 |
|
trygvis |
perhaps you want to configure batch size in your jpa provider |
11:57 |
|
trygvis |
are you running out of memory? |
11:58 |
|
pdurbin |
not sure. knowing what I know now, that would be a good thing to look at |
11:58 |
|
trygvis |
try this: if(dataverseIndexCount % 1000 == ) entityManager.flush(); |
11:58 |
|
trygvis |
if(dataverseIndexCount % 1000 == 0) entityManager.flush(); |
11:59 |
|
pdurbin |
flush every so often, you're saying |
11:59 |
|
trygvis |
yep |
11:59 |
|
pdurbin |
is all this stuff covered in the Jave EE tutorial? or elsewhere? |
11:59 |
|
trygvis |
but, where are you actually using time or cpu? it seems you're reading from a database just to stuff it in a solr index |
12:00 |
|
trygvis |
http://stackoverflow.com/questions/9994699/solr-reindex-recommended-batch-size |
12:00 |
|
trygvis |
usually you don't want to commit often, only send to the server |
12:01 |
|
trygvis |
but you have two methods there that I have no idea what do; indexDataset() and indexDataverse() |
12:01 |
|
pdurbin |
at a high level, yes, the work is to read from postgres and put some data into solr |
12:01 |
|
trygvis |
postgresql <3 |
12:01 |
|
trygvis |
ok, so if you're writing to solr you don't want to flush the entityManager (it doesn't have any changes to flush) but flush solr |
12:01 |
|
pdurbin |
:) |
12:02 |
|
pdurbin |
oh oh oh, sorry |
12:02 |
|
pdurbin |
we do write to postgres too |
12:02 |
|
trygvis |
you could possibly also just run without a transaction as you're rebuilding everything |
12:02 |
|
trygvis |
s,as,if, |
12:03 |
|
trygvis |
it is usually like running with scissors, but if your database is empty it's usually ok :) |
12:03 |
|
pdurbin |
we write to postgres the timestamp at which we indexed into solr without error. we store this timestamp so later we can compare it to another timestamp to see if solr has the latest data. to see if solr is in sync with the data in postgres |
12:04 |
|
trygvis |
ok, then you want to flush the entity manager for every thousand write |
12:04 |
|
pdurbin |
before the fix the writing of the timestamps to the database was being done in a single transaction and the process/job was getting slower and slower as it ran |
12:05 |
|
pdurbin |
that makes sense but this should have only been about 1600 writes in total |
12:05 |
|
pdurbin |
so it's shocking that it was taking 3.5 hours |
12:06 |
|
trygvis |
but it is strange that adding that tx boundary made it all better, that means that the reindexing process isn't what is causing the badness |
12:06 |
|
pdurbin |
oh I suspect there's more badness I haven't found yet |
12:06 |
|
pdurbin |
it's all very spooky |
12:07 |
|
trygvis |
it could be as easy to try to do a flush before you call indexAll() |
12:11 |
|
pdurbin |
yeah, could try adding flush |
12:15 |
|
|
vanHoesel joined #rest |
12:21 |
|
|
interop_madness joined #rest |
12:57 |
|
|
vanHoesel joined #rest |
13:02 |
|
|
composed joined #rest |
13:02 |
|
composed |
Hmm Douglas Crockford says the statelessness of HTTP was its biggest mistake. |
13:02 |
|
composed |
And talks about WebSockets with Node.JS |
13:04 |
|
composed |
WebSockets are interesting. One sends an HTTP request negotiating to drop to a *lower* level of protocol |
13:04 |
|
composed |
Our stack has many warts folks |
13:05 |
|
asdf` |
i'm not sure i'd call websockets 'lower level' |
13:07 |
|
composed |
asdf`: they are much closer to TCP than HTTP |
13:07 |
|
composed |
asdf`: but they're layered on top of HTTP because HTTP is everywhere |
13:11 |
|
trygvis |
composed: statelessness of HTTP is one of its biggest features, and required when trying to build "the web" |
13:12 |
|
trygvis |
comparing streaming messages from one node to another to http is apples to oranges |
13:12 |
|
composed |
trygvis: well maybe it was required in the 80s |
13:12 |
|
composed |
trygvis: right now we have movements like "encrypt everything" |
13:13 |
|
composed |
trygvis: so little to nothing of "the web" as originally planned gets used, and there are tons of security measures to avoid mixing domains for the rest of it because of security implications |
13:13 |
|
trygvis |
if HTTP becomes a problem, some other technology will prevail |
13:13 |
|
bigbluehat |
composed: where'd you see the crockford quote? |
13:13 |
|
bigbluehat |
sounds like he believe the network is always available |
13:14 |
|
composed |
bigbluehat: why would he believe that |
13:14 |
|
bigbluehat |
yeah...good question |
13:14 |
|
trygvis |
available, reliable, fast, etc |
13:14 |
|
bigbluehat |
non-latent |
13:14 |
|
trygvis |
shiby! |
13:14 |
|
composed |
bigbluehat: so you pose the question and ponder the answer. You strawmanned Crockford :P |
13:14 |
|
trygvis |
err, shiny! |
13:15 |
|
bigbluehat |
hehe |
13:15 |
|
composed |
Using a stateful connection doesn't mean it can't tolerate disruption. |
13:15 |
|
bigbluehat |
true. |
13:15 |
|
composed |
It just means there's a well identifiable session, and if the session ends, it can start over later |
13:16 |
|
bigbluehat |
and you have shared state |
13:16 |
|
bigbluehat |
so your state machine has 2 heads |
13:16 |
|
bigbluehat |
composed: curious about your "little to nothing of "the web" as originally planned..." comment |
13:16 |
|
* bigbluehat |
being a fan of RFC 2068 ;) |
13:17 |
|
composed |
bigbluehat: shared state typically refers to some state that multiple entities can mutate independently. |
13:17 |
|
_ollie |
what Douglas Crockford effectively says: "HTTPs biggest mistake is that it's not a solution to a problem I have" which is BS :) |
13:17 |
|
composed |
Because of shared state simply means "two things know about one thing" then the web is also one giant shared state |
13:18 |
|
trygvis |
_ollie: +1! |
13:18 |
|
|
vanHoesel joined #rest |
13:18 |
|
bigbluehat |
anyone have a URL for these statements by crockford? |
13:18 |
|
composed |
_ollie: how'd you implement website login without session cookies, which represent the client emitting state to the server to manage |
13:19 |
|
composed |
bigbluehat: one sec |
13:19 |
|
bigbluehat |
"URL or it didn't happen" ;) |
13:19 |
|
trygvis |
bigbluehat: welcome to #rest :) |
13:19 |
|
bigbluehat |
composed: I don't think I'd say that state is shared anywhere on the web |
13:19 |
|
composed |
bigbluehat: minute 41-42 and onward https://www.youtube.com/watch?v=QgwSUtYSUqA |
13:19 |
|
bigbluehat |
it's transferred...but not shared |
13:19 |
|
bigbluehat |
tnx trygvis :D |
13:19 |
|
bigbluehat |
...been too long |
13:20 |
|
trygvis |
you've been missing out on some real trolls |
13:20 |
|
bigbluehat |
aw man... ;) |
13:20 |
|
composed |
bigbluehat: it comes down to what he says: it's easy to decide "everything is stateless" because it makes it easy to write a standard that's stateless |
13:20 |
|
composed |
But it's impossible to write a simple website login without SOME shared state |
13:21 |
|
composed |
Like an emitted session id |
13:21 |
|
_ollie |
composed: it seems like you misunderstood the statelessness requirement in REST… |
13:21 |
|
bigbluehat |
is it? |
13:21 |
|
composed |
The trick with designing big networked systems is not to eliminate state but isolate it. |
13:22 |
|
composed |
There will be some component that is aware of state. The rest maybe won't be |
13:22 |
|
composed |
HTTP says the client can be that component |
13:22 |
|
bigbluehat |
hilarious statements by Mr. C. |
13:22 |
|
composed |
But truth is the client can't have ALL the state, or basically every service that has a user account can't exist. |
13:22 |
|
bigbluehat |
"HTTP was designed completely wrong" |
13:22 |
|
_ollie |
REST requires the request to contain all necessary information and the server not magically identifying itby some means, that's all… |
13:22 |
|
_ollie |
thus, a cookie is perfectly fine |
13:23 |
|
composed |
bigbluehat: question is why. "Hilarity" is not objective metric for correctness |
13:23 |
|
bigbluehat |
no...it's terribly objective ;) |
13:23 |
|
bigbluehat |
HTTP doesn't work for the way he wants to write apps |
13:23 |
|
bigbluehat |
...it doesn't mean it was designed "completely wrong" |
13:24 |
|
bigbluehat |
it was just designed for a different thing than he's using it for |
13:24 |
|
composed |
bigbluehat: no I'm serious, I prefer we don't fall to this immature level of analysis. Let's have some argument from an engineering point of view. |
13:24 |
|
bigbluehat |
so *he* wants something else |
13:24 |
|
bigbluehat |
aka...it wasn't designed for *him* |
13:24 |
|
bigbluehat |
^^ just did ;) |
13:24 |
|
composed |
_ollie: fielding says cookies directly oppose the REST style. |
13:25 |
|
composed |
bigbluehat: "I think he's funny" is not an engineering viewpoint I'm afraid. He has very specific arguments about constantly passing back and forth context that doesn't change with stateless designs, which is a real bottleneck |
13:26 |
|
trygvis |
composed: if you think you need a session id you're starting off wrong |
13:26 |
|
composed |
Ideally you want to be able to start over a session, but there's no need to eliminate sessions entirely. |
13:26 |
|
_ollie |
composed: where? |
13:26 |
|
composed |
trygvis: ok I'm asking you, how do you do it right |
13:26 |
|
composed |
trygvis: how do you log into a site without a session id that you pass back every time |
13:27 |
|
trygvis |
again, you're starting off wrong. I don't log into a site |
13:27 |
|
trygvis |
I supply credentials on every request (like the spec caters for) |
13:27 |
|
composed |
trygvis: fine, I want to have favorite channels on YouTube without having a YouTube-specific app. How do we do this. |
13:28 |
|
composed |
trygvis: if you supply credentials on *every* request, it means you can't cache *any* response at a proxy |
13:28 |
|
composed |
Because credentials will be unique |
13:28 |
|
composed |
trygvis: furthermore a session id is in fact "credentials" |
13:28 |
|
composed |
So it's the same thing by another name |
13:29 |
|
trygvis |
if the server want to me to read some shared, public, cacheable data it can point me to another host (realms are per host) |
13:29 |
|
trygvis |
you can implement it like that, but it is not the same thing |
13:30 |
|
composed |
trygvis: fine, so after all we need to split things by use case, and one of those use cases absolutely needs statefulness |
13:31 |
|
composed |
trygvis: and the same use case that requires statefulness (session, credentials, security, domain isolation) is curiously the same exact use case to have an API for. Not many APIs do much useful without credentials. |
13:31 |
|
composed |
so what is the bottom line here. Crockford is not so hilarious after all |
13:31 |
|
trygvis |
jez, it's you again |
13:32 |
|
composed |
what? |
13:37 |
|
|
azr joined #rest |
13:53 |
|
|
shrink0r_ joined #rest |
14:07 |
|
|
ewalti joined #rest |
14:43 |
|
|
nkoza joined #rest |
15:07 |
|
|
ewalti joined #rest |
15:07 |
|
|
ewalti joined #rest |
15:09 |
|
|
ewalti joined #rest |
15:25 |
|
|
ewalti joined #rest |
15:46 |
|
|
JudasBricot joined #rest |
16:05 |
|
|
azr joined #rest |
16:20 |
|
whartung |
hey pdurbin |
16:28 |
|
|
ewalti joined #rest |
16:32 |
|
|
azr joined #rest |
16:32 |
|
pdurbin |
whartung: hey. trygvis talked me off the ledge. fun with EJB |
16:32 |
|
whartung |
yea I glanced at but then got hit with TL;DR |
16:34 |
|
pdurbin |
it short, I was surprised by the fix |
16:34 |
|
pdurbin |
which made things 14 times faster |
16:34 |
|
whartung |
care to summarize? |
16:34 |
|
|
ewalti joined #rest |
16:36 |
|
pdurbin |
basically indexAll was taking 3.5 hours on very little data. only creating ~1600 Solr documents based on data in postgres. as the Solr documents are created we update the database with a timestamp per row |
16:37 |
|
pdurbin |
the fix was to put indexAll in a new bean (to create an EJB boundary) and add @TransactionAttribute(REQUIRES_NEW) to the methods that indexAll is calling over and over, such as indexDataset |
16:37 |
|
pdurbin |
https://github.com/IQSS/dataverse/commit/a1d9da4 |
16:37 |
|
pdurbin |
this reduced the time from 3.5 hours to 15 minutes |
16:38 |
|
pdurbin |
I guess the thing that disturbs me is that this fix is completely unintuitive to me. |
16:38 |
|
whartung |
how did this speed it up? Is Solr part of the transaction? |
16:38 |
|
pdurbin |
It's like magic. |
16:38 |
|
pdurbin |
I don't know if Solr is part of the transaction. |
16:38 |
|
pdurbin |
I think I need to study EJB transactions. |
16:38 |
|
trygvis |
pdurbin: I doubt that your solution is a 'correct' fix |
16:39 |
|
pdurbin |
ok, let's call it a solution then :) |
16:39 |
|
trygvis |
I doubt it is unless you have done some magic to configure solr as a part of your XA |
16:39 |
|
trygvis |
it'll float your boat for "a while" :) |
16:39 |
|
trygvis |
but I'm out |
16:40 |
|
whartung |
yea that doesn't make any sense whatsoever |
16:40 |
|
whartung |
where did you get the idea to even try it? |
16:40 |
|
whartung |
and I'm skeptical that Solr is transactional |
16:41 |
|
pdurbin |
whartung: people here with way more experience with EJB than I have said something along the lines of, "We think indexAll is being treated at a single transaction. Let's add @TransactionAttribute(REQUIRES_NEW) and see if it helps." And it did. |
16:42 |
|
pdurbin |
before the solution, you could sort of tell that indexAll was getting slower and slower as it ran |
16:42 |
|
pdurbin |
I never witnessed the 3.5 hours it took. Too impatient. |
16:46 |
|
pdurbin |
I had figured out that writing those timestamps was slowing things down, but that's about it. |
16:46 |
|
pdurbin |
anyway, EJB moves in mysterious ways |
16:46 |
|
pdurbin |
and I should probably read a book about it |
16:47 |
|
whartung |
well, it doesn't, really. EJB is pretty bone stupid. |
16:47 |
|
whartung |
you were only updating 1600 rows? |
16:47 |
|
whartung |
Were you relying onthe EntityManager to flush the updates? |
16:49 |
|
pdurbin |
about 1600 solr documents get created based on fewer rows that that in the database. let's say half, 800 rows in the database |
16:50 |
|
whartung |
but you update those rows, right? |
16:50 |
|
pdurbin |
right |
16:52 |
|
whartung |
and you're just using the entitymanager, right? fetch the entity, change it, an dlet the EM flush it when it's good and ready? |
16:52 |
|
pdurbin |
two timestamps actually. because each row in the database (more or less) becomes two solr documents. so we record a timestamp for each of the two solr documents per row |
16:52 |
|
whartung |
do you fetch all of your data upfront? |
16:55 |
|
pdurbin |
whartung: yes. I fetch a list of all datasets up front. |
16:55 |
|
pdurbin |
then iterate over them |
16:55 |
|
whartung |
are they eager? are there any lazy relationships? |
16:56 |
|
pdurbin |
I don't know. |
16:56 |
|
whartung |
well, do your root dataset rows relate to other rows, to other collections? |
16:56 |
|
pdurbin |
to update the timestamp I do use entitymanager. I do an em.merge |
17:30 |
|
|
fragamus joined #rest |
17:37 |
|
|
azr joined #rest |
17:42 |
|
|
shrink0r joined #rest |
18:31 |
|
pdurbin |
whartung: I think at some point you said I should read the EJB 3 JSR PDF. |
18:32 |
|
saml |
nooooooooooo |
18:32 |
|
saml |
EJB |
18:32 |
|
whartung |
heh. The JSR is interesting for sure, but it's a bit thick |
18:32 |
|
saml |
java |
18:32 |
|
|
ewalti joined #rest |
18:32 |
|
* fumanchu |
wonders what saml codes in |
18:32 |
|
pdurbin |
ok. maybe I'll read the 1000 page Java EE tutorial instead. :) |
18:33 |
|
saml |
node.js hehehhehehehehehehehe |
18:33 |
|
saml |
don't use node.js |
18:33 |
|
saml |
it hurts feelings |
18:33 |
|
saml |
do you use undertow? |
18:35 |
|
whartung |
Here's my thinking pdurbin |
18:35 |
|
whartung |
First, it has nothing to do with Solr |
18:35 |
|
pdurbin |
ok, nothing to do with solr. makes sense |
18:35 |
|
whartung |
That solr interface is not XA at all, again you can see that by the fact that you manually call commit. |
18:35 |
|
pdurbin |
yeah |
18:36 |
|
pdurbin |
I mean. it's a web service. It's like calling into the twitter api. |
18:36 |
|
whartung |
Not saying XA is impossible over HTTP, but…unlikely for a gardern variety HTTP interface. |
18:36 |
|
whartung |
so with joa |
18:36 |
|
whartung |
jpa |
18:37 |
|
whartung |
when you're doing a bunch of changes, jpa caches it's updates in ram. |
18:38 |
|
whartung |
nominally, it will flush all of the work on transaction commit. |
18:38 |
|
pdurbin |
ok. trygvis was asking if a lot of memory was being used |
18:38 |
|
whartung |
but you just said it was only 1600 rows |
18:38 |
|
whartung |
"that's nothing(™)" |
18:39 |
|
whartung |
sec... |
18:42 |
|
whartung |
so |
18:42 |
|
whartung |
simple case |
18:43 |
|
whartung |
you create 1000 entities, then the transaction commits, and 1000 insert statements flood out to the db server. |
18:43 |
|
whartung |
so, over time, a transcation can build up in ram, leaving a footprint. |
18:43 |
|
whartung |
but, 1600 rows isn't a lot, typically |
18:43 |
|
whartung |
now |
18:44 |
|
whartung |
the other time jpa pushes sql to the db is wheneer it queries the server. |
18:44 |
|
whartung |
so you can load in a list of entities, change the entity, and then access the next one |
18:44 |
|
whartung |
but when you access the entity, it has lazy associations |
18:44 |
|
whartung |
which causes a new query in the background to hit the db server. |
18:45 |
|
whartung |
so when that happens, the pending updates will be flushed first. |
18:45 |
|
whartung |
so instead of 1000 inserts at the end of the xtn, you get your sql all mixed up of inserts and selects. |
18:46 |
|
whartung |
but even if that happens, while the transaction is open, the internal footprint will grow. |
18:46 |
|
whartung |
are you using postgres? |
18:46 |
|
pdurbin |
yes. postgres |
18:47 |
|
whartung |
are you updating a single row over and over and over? |
18:47 |
|
pdurbin |
not on purpose if I am |
18:47 |
|
whartung |
ok |
18:47 |
|
|
vanHoesel joined #rest |
18:47 |
|
pdurbin |
I mean, the row does get updated twice. |
18:47 |
|
pdurbin |
because we store two timestamps |
18:48 |
|
pdurbin |
each for a solr document that gets indexed |
18:48 |
|
whartung |
its been a while since I tested, but in the past, updating a single row, over and over and over in pg in a single xtn can be slow, because each row creates a new "ghost" row in the DB, and each new update has to crawl that list. So, if you updated a single row 1000 times, you end up haveing to scan 1000 rows for the next update. |
18:48 |
|
pdurbin |
contentIndexTime vs. permissionIndexTime timestamps. two of them. same row |
18:48 |
|
whartung |
but doesn't sound like that's happening here. |
18:49 |
|
* pdurbin |
is scared of ghost rows |
18:49 |
|
whartung |
nah, they go away on commit. |
18:49 |
|
whartung |
feature, not a bug. |
18:51 |
|
pdurbin |
phew |
18:51 |
|
whartung |
So, there's that. That suggests that in the large xtn scenarion in your case, the overhead at the JPA/DB level of managing all that change is expensive. This would manifest by a CPU being pegged, when in this case, it shouldn't be -- should be mostly I/O |
18:52 |
|
whartung |
all your work appears to be in a single thread, so contnetion doesn't seem like the issue. |
18:52 |
|
whartung |
as a simple test, you can try performing JUST the db operations (skip the solr calls) and see how long it takes. |
18:52 |
|
whartung |
3.5hrs is still a crazy number |
18:53 |
|
whartung |
in any case |
18:53 |
|
pdurbin |
yeah |
18:53 |
|
whartung |
updating 800 rows…big deal |
18:53 |
|
pdurbin |
right |
18:53 |
|
whartung |
"Oh no, all that data might almost fill a cache line in the CPU!" |
18:54 |
|
* whartung |
used to have 88k floppy disks |
18:54 |
|
whartung |
so that would be an intersting test |
18:56 |
|
whartung |
because all breaking up the xtn is doing is lowering the memory impact of the overall xtn |
18:57 |
|
pdurbin |
right. I mean, there are probably many ways to relieve the memory getting eaten up. Not that I've confirmed if it was memory or cpu. |
18:57 |
|
whartung |
how big are these documents? is the data in the DB the actual data, or just references to files? |
19:00 |
|
pdurbin |
the resulting Solr documents? not all that big I don't think |
19:00 |
|
whartung |
the rows in the db |
19:00 |
|
pdurbin |
oh, well, for every row we root around all over the db to gather the data required to construct the solr documents |
19:01 |
|
whartung |
ok, but you have "800" of them. How much data is one of those "800" |
19:02 |
|
pdurbin |
it's hard to answer but let's say not very much |
19:04 |
|
whartung |
ok |
19:04 |
|
whartung |
so you're not sucking in 1600 1M documents |
19:04 |
|
whartung |
sending you GC for a tizzy |
19:04 |
|
whartung |
that could be the other thing, need more memory, stuck in GC hell |
19:04 |
|
whartung |
be intteresting to see the memory usage |
19:05 |
|
pdurbin |
the first solr doc I'm looking at is 71 lines of JSON |
19:06 |
|
whartung |
ooh |
19:06 |
|
pdurbin |
yeah, as I continue to dig into the performance problem I'll look at memory and cpu and whatnot. this was just crazy. the 3.5 hours thing. now down to 15 minutes by adding "require new transactions" |
19:06 |
|
whartung |
so basically by breaking up the xtn, each of those documents can come and go one by one rather than being all cached up waiting for the big single commit. |
19:07 |
|
whartung |
I'd still do the 'db only' test |
19:07 |
|
whartung |
for laffs |
19:07 |
|
pdurbin |
take solr out for a bit. i hear ya |
19:07 |
|
whartung |
15m for 800 documents is still kind of crazy, imho |
19:07 |
|
* pdurbin |
tells Solr it's not his fault |
19:07 |
|
* whartung |
… yet |
19:08 |
|
pdurbin |
oh sure. gotta make it faster still |
19:09 |
|
whartung |
I would think that solr would index those faster than that |
19:10 |
|
pdurbin |
solr is quite nice. I'm sure I'm just doing things wrong |
19:12 |
|
pdurbin |
I can't decide if EJB is quite nice. :) |
19:13 |
|
whartung |
my one complaint for EJB is that each EJB gets its own, private JNDI tree. |
19:13 |
|
whartung |
this may not be a problem with WAR deployments. |
19:13 |
|
whartung |
but when you start integrating EJBs from other jars, it's kind of a pain. |
19:14 |
|
whartung |
now, they DO have a new, "canonical" JNDI name |
19:14 |
|
whartung |
they're just awful names |
19:14 |
|
whartung |
seems to me EJBs in a war may have less of an issue with this. |
19:15 |
|
whartung |
and if you are using CDI for all your bean injections, it might be less of a problem -- I've not use it to that extent. |
19:15 |
|
whartung |
doing that would, ostensibly, solve many ills. |
19:16 |
|
whartung |
the JNDI tree is a legacy requirement since EJBs are individual deployable elements. |
19:16 |
|
whartung |
bbl afk lunch |
19:16 |
|
pdurbin |
bon appetit |
19:31 |
|
|
JudasBricot joined #rest |
19:34 |
|
|
shrink0r joined #rest |
20:42 |
|
|
vanHoesel joined #rest |
20:56 |
|
|
graste joined #rest |
21:12 |
|
|
vanHoesel joined #rest |
22:00 |
|
|
composed joined #rest |
22:10 |
|
|
vanHoesel joined #rest |
22:25 |
|
|
talios joined #rest |
22:28 |
|
|
vanHoesel joined #rest |
22:33 |
|
trygvis |
whartung: postgres has gotten a nice optimalization when non-indexed fields are updated called HOT |
22:33 |
|
trygvis |
Heap Only Tuples |
22:33 |
|
whartung |
yeaok |
22:33 |
|
trygvis |
dunno how it will satisfy MVCC at the same time, but anyway |
22:34 |
|
trygvis |
it also seems quite old: http://www.postgresql.org/message-id/27473.1189896544@sss.pgh.pa.us |
22:34 |
|
whartung |
MVCC is mostly about locking |
22:34 |
|
trygvis |
yes, but when a tx is updating a row I can't remember how postgresql does it. if it locks the row or not |
22:36 |
|
trygvis |
anyway, I'm off for tonight. later |
22:36 |
|
composed |
If it's an atomic operation it's by definition locked for a moment |
22:36 |
|
composed |
While updated |
22:37 |
|
whartung |
tt trygvis |
22:39 |
|
trygvis |
composed: no, it's not. but any other tx that wrote to it can't complete unless the earlier tx fails |
22:41 |
|
trygvis |
whartung: enjoy the troll |
22:41 |
|
* trygvis |
is out for real now! |
22:42 |
|
pdurbin |
trolling and running. I see how it is :) |
22:49 |
|
|
warehouse13 joined #rest |
23:17 |
|
|
_ollie joined #rest |
23:18 |
|
|
vanHoesel joined #rest |
23:37 |
|
|
rhyselsmore joined #rest |
23:37 |
|
|
rhyselsmore joined #rest |
23:42 |
|
|
vanHoesel joined #rest |