Time |
S |
Nick |
Message |
11:50 |
|
pdurbin |
listening to a new podcast about git. it's pretty good: http://episodes.gitminutes.com/2013/03/gitminutes-01-randal-l-schwartz-on.html |
17:30 |
|
|
aditsu joined #sourcefu |
17:31 |
|
aditsu |
pdurbin: hey, so I have some questions about faceted search |
17:31 |
|
aditsu |
I think it's simple and fun as long as you have global and independent facets |
17:31 |
|
pdurbin |
i'm up to my ears in faceted search |
17:32 |
|
pdurbin |
it is kinda fun |
17:32 |
|
aditsu |
but how do you deal with it if you have 1) facets that depend on the values of other facets and 2) facet values that depend on the values of other facets? |
17:34 |
|
pdurbin |
hmm, I'm not really doing that. but you can combine facets. solr seems to call this "pivot facets" |
17:34 |
|
pdurbin |
aditsu: have you spun up solr and played with Solritas - http://localhost:8983/solr/collection1/browse ? |
17:34 |
|
aditsu |
I barely heard of solr.. do you think I should use it? |
17:35 |
|
pdurbin |
maybe. you should at least spin it up and play around with that interface |
17:36 |
|
aditsu |
umm.. that localhost link doesn't seem very useful |
17:36 |
|
pdurbin |
and the admin gui at http://localhost:8983/solr |
17:37 |
|
pdurbin |
crimsonfubot: lucky solr 4 tutorial |
17:37 |
|
crimsonfubot |
pdurbin: http://lucene.apache.org/solr/api-4_0_0-BETA/doc-files/tutorial.html |
17:37 |
|
pdurbin |
bad crimsonfubot |
17:37 |
|
pdurbin |
use this one: http://lucene.apache.org/solr/4_2_0/tutorial.html |
17:37 |
|
aditsu |
ah, you mean after I run it |
17:38 |
|
pdurbin |
yeah |
17:38 |
|
pdurbin |
java -jar start.jar |
17:38 |
|
pdurbin |
then |
17:38 |
|
pdurbin |
java -jar post.jar *.xml |
17:38 |
|
aditsu |
what the... 111MB |
17:40 |
|
pdurbin |
maybe elastic search is smaller |
17:42 |
|
pdurbin |
aditsu: I mentioned your question at http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2013-04-02#l74 |
17:44 |
|
aditsu |
ok I got solr running |
17:44 |
|
pdurbin |
I find that Solritas interface a nice starting point for faceting... what it is... how you might implement faceting. I'm also keeping a list of open source projects with faceting at http://wiki.greptilian.com/search/faceted |
17:46 |
|
aditsu |
so what does it index? |
17:46 |
|
aditsu |
xml files? |
17:46 |
|
pdurbin |
good question |
17:47 |
|
pdurbin |
so far I've only fed xml files to solr |
17:47 |
|
pdurbin |
did you feed it the example xml files? about electronics? |
17:48 |
|
aditsu |
yeah I did java -jar post.jar *.xml |
17:48 |
|
pdurbin |
cool, so you have some data to play with |
17:49 |
|
pdurbin |
you can delete the data with some curl commands: https://github.com/dvn/solrpoc/blob/master/perl/clear.pl |
17:49 |
|
aditsu |
yeah, but I'm not sure how to use it for my requirements |
17:49 |
|
pdurbin |
what are they? :) |
17:50 |
|
aditsu |
well, the main thing is I need to manage some files with metadata attached |
17:51 |
|
aditsu |
pdf, jpg, doc, whatever |
17:51 |
|
aditsu |
there will be an upload form with a bunch of fields (for entering the facet values) |
17:52 |
|
aditsu |
and I planned to use a database to store the metadata and references to the files |
17:54 |
|
pdurbin |
sounds fairly similar to the open source app I work on, actually: http://thedata.org |
17:54 |
|
pdurbin |
people enter metadata for academic studies. I'm treating that metadata a facets (authorName, publicationDate, etc.) |
17:54 |
|
pdurbin |
as* facets |
17:56 |
|
pdurbin |
the metadata is stored in postgres. the facets and other lucene index files can be regenerated by re-indexing |
17:57 |
|
aditsu |
also, the system should support different types of users, with access restrictions |
17:58 |
|
aditsu |
if it was about books, you could have for example readers, authors and publishers |
17:58 |
|
pdurbin |
yep. our app has similar roles and permissions |
17:58 |
|
pdurbin |
readers of data, owners of data, etc. |
17:59 |
|
aditsu |
if you already have a database, then what's the point of using lucene index files? or viceversa |
18:00 |
|
pdurbin |
you could implement faceting entirely in postgres. evergreen ILS does (listed at http://wiki.greptilian.com/search/faceted ) |
18:03 |
|
aditsu |
yeah I intended to do that, but then I have those 2 questions |
18:04 |
|
pdurbin |
a lucene index feel kinda... ephemeral... I feel safer having the metadata in a database as well |
18:05 |
|
aditsu |
to explain them better.. imagine a shopping site for all kinds of products, for laptops you want to have facets like screen size, memory, cpu type etc. but those will not apply to e.g. toy cars, those are facets that depend on the values of other facets |
18:08 |
|
aditsu |
and then you can have manufacturer, series and model, like Lenovo, Ideapad, Yoga; you probably don't want to select the series before the manufacturer, and also the values should be limited to the ones made by that manufacturer |
18:08 |
|
pdurbin |
I can imagine such a site :) |
18:09 |
|
aditsu |
this is not just for searching (where you can limit by the values you have), but also for classification (when you add a new item) |
18:10 |
|
pdurbin |
in practice I add many facets to my search but I only get results for facets that exist |
18:10 |
|
pdurbin |
if that makes sense |
18:11 |
|
aditsu |
not sure what you mean |
18:12 |
|
pdurbin |
I could request authorName and productionDate facets but if nobody has been filling in productionDate I would only get authorName facets back |
18:13 |
|
pdurbin |
oh, here. give this a try: https://github.com/pdurbin/lucene-facet-demo |
18:13 |
|
aditsu |
that's a different thing |
18:15 |
|
aditsu |
um, I don't use maven |
18:20 |
|
aditsu |
so I was wondering how to store facets and values.. would EAV be suitable? that makes queries kinda difficult |
18:21 |
|
pdurbin |
EAV? |
18:21 |
|
aditsu |
http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model |
18:22 |
|
pdurbin |
sounds fancy |
18:22 |
|
pdurbin |
these are probably good questions for #lucene-dev |
18:24 |
|
aditsu |
hmm if I ask lucene people, I expect I will get lucene answers |
18:24 |
|
pdurbin |
hmm, maybe ##programming |
18:26 |
|
aditsu |
I'll try reading more stuff first |
18:26 |
|
aditsu |
this seems interesting: http://www.miskatonic.org/library/facet-web-howto.html |
18:26 |
|
pdurbin |
aditsu: if I have a eureka moment, I'll let you know :) |
18:26 |
|
semiosis |
in case you missed it, a gem from monitorama: "Data mullet: relational db in the front, NoSQL in the back" -@lxt |
18:26 |
|
semiosis |
https://twitter.com/amateurhuman/status/317351099235446785 |
18:29 |
|
pdurbin |
aditsu: I've been meaning to read http://boonious.typepad.com/ux2/2011/01/implementing-faceted-search-ui.html ... maybe you can tell me if it's any good |
18:30 |
|
aditsu |
pdurbin: thanks, if I read it, I can tell you if it's any good *to me* :p |
23:39 |
|
pdurbin |
administrivia: I just tweaked philbot so we no longer see "somebody joined #sourcefu" in the logs. I don't see a lot of value in it |