greptilian logo

IRC log for #sourcefu, 2013-04-02

http://sourcefu.com

| Channels | #sourcefu index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
11:50 pdurbin listening to a new podcast about git. it's pretty good: http://episodes.gitminutes.com/2013/03/gitminutes-01-randal-l-schwartz-on.html
17:30 aditsu joined #sourcefu
17:31 aditsu pdurbin: hey, so I have some questions about faceted search
17:31 aditsu I think it's simple and fun as long as you have global and independent facets
17:31 pdurbin i'm up to my ears in faceted search
17:32 pdurbin it is kinda fun
17:32 aditsu but how do you deal with it if you have 1) facets that depend on the values of other facets and 2) facet values that depend on the values of other facets?
17:34 pdurbin hmm, I'm not really doing that. but you can combine facets. solr seems to call this "pivot facets"
17:34 pdurbin aditsu: have you spun up solr and played with Solritas - http://localhost:8983/solr/collection1/browse ?
17:34 aditsu I barely heard of solr.. do you think I should use it?
17:35 pdurbin maybe. you should at least spin it up and play around with that interface
17:36 aditsu umm.. that localhost link doesn't seem very useful
17:36 pdurbin and the admin gui at http://localhost:8983/solr
17:37 pdurbin crimsonfubot: lucky solr 4 tutorial
17:37 crimsonfubot pdurbin: http://lucene.apache.org/solr/api-4_0_0-BETA/doc-files/tutorial.html
17:37 pdurbin bad crimsonfubot
17:37 pdurbin use this one: http://lucene.apache.org/solr/4_2_0/tutorial.html
17:37 aditsu ah, you mean after I run it
17:38 pdurbin yeah
17:38 pdurbin java -jar start.jar
17:38 pdurbin then
17:38 pdurbin java -jar post.jar *.xml
17:38 aditsu what the... 111MB
17:40 pdurbin maybe elastic search is smaller
17:42 pdurbin aditsu: I mentioned your question at http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2013-04-02#l74
17:44 aditsu ok I got solr running
17:44 pdurbin I find that Solritas interface a nice starting point for faceting... what it is... how you might implement faceting. I'm also keeping a list of open source projects with faceting at http://wiki.greptilian.com/search/faceted
17:46 aditsu so what does it index?
17:46 aditsu xml files?
17:46 pdurbin good question
17:47 pdurbin so far I've only fed xml files to solr
17:47 pdurbin did you feed it the example xml files? about electronics?
17:48 aditsu yeah I did java -jar post.jar *.xml
17:48 pdurbin cool, so you have some data to play with
17:49 pdurbin you can delete the data with some curl commands: https://github.com/dvn/solrpoc/blob/master/perl/clear.pl
17:49 aditsu yeah, but I'm not sure how to use it for my requirements
17:49 pdurbin what are they? :)
17:50 aditsu well, the main thing is I need to manage some files with metadata attached
17:51 aditsu pdf, jpg, doc, whatever
17:51 aditsu there will be an upload form with a bunch of fields (for entering the facet values)
17:52 aditsu and I planned to use a database to store the metadata and references to the files
17:54 pdurbin sounds fairly similar to the open source app I work on, actually: http://thedata.org
17:54 pdurbin people enter metadata for academic studies. I'm treating that metadata a facets (authorName, publicationDate, etc.)
17:54 pdurbin as* facets
17:56 pdurbin the metadata is stored in postgres. the facets and other lucene index files can be regenerated by re-indexing
17:57 aditsu also, the system should support different types of users, with access restrictions
17:58 aditsu if it was about books, you could have for example readers, authors and publishers
17:58 pdurbin yep. our app has similar roles and permissions
17:58 pdurbin readers of data, owners of data, etc.
17:59 aditsu if you already have a database, then what's the point of using lucene index files? or viceversa
18:00 pdurbin you could implement faceting entirely in postgres. evergreen ILS does (listed at http://wiki.greptilian.com/search/faceted )
18:03 aditsu yeah I intended to do that, but then I have those 2 questions
18:04 pdurbin a lucene index feel kinda... ephemeral... I feel safer having the metadata in a database as well
18:05 aditsu to explain them better.. imagine a shopping site for all kinds of products, for laptops you want to have facets like screen size, memory, cpu type etc. but those will not apply to e.g. toy cars, those are facets that depend on the values of other facets
18:08 aditsu and then you can have manufacturer, series and model, like Lenovo, Ideapad, Yoga; you probably don't want to select the series before the manufacturer, and also the values should be limited to the ones made by that manufacturer
18:08 pdurbin I can imagine such a site :)
18:09 aditsu this is not just for searching (where you can limit by the values you have), but also for classification (when you add a new item)
18:10 pdurbin in practice I add many facets to my search but I only get results for facets that exist
18:10 pdurbin if that makes sense
18:11 aditsu not sure what you mean
18:12 pdurbin I could request authorName and productionDate facets but if nobody has been filling in productionDate I would only get authorName facets back
18:13 pdurbin oh, here. give this a try: https://github.com/pdurbin/lucene-facet-demo
18:13 aditsu that's a different thing
18:15 aditsu um, I don't use maven
18:20 aditsu so I was wondering how to store facets and values.. would EAV be suitable? that makes queries kinda difficult
18:21 pdurbin EAV?
18:21 aditsu http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
18:22 pdurbin sounds fancy
18:22 pdurbin these are probably good questions for #lucene-dev
18:24 aditsu hmm if I ask lucene people, I expect I will get lucene answers
18:24 pdurbin hmm, maybe ##programming
18:26 aditsu I'll try reading more stuff first
18:26 aditsu this seems interesting: http://www.miskatonic.org/library/facet-web-howto.html
18:26 pdurbin aditsu: if I have a eureka moment, I'll let you know :)
18:26 semiosis in case you missed it, a gem from monitorama: "Data mullet: relational db in the front, NoSQL in the back" -@lxt
18:26 semiosis https://twitter.com/amateurhuman/status/317351099235446785
18:29 pdurbin aditsu: I've been meaning to read http://boonious.typepad.com/ux2/2011/01/implementing-faceted-search-ui.html ... maybe you can tell me if it's any good
18:30 aditsu pdurbin: thanks, if I read it, I can tell you if it's any good *to me* :p
23:39 pdurbin administrivia: I just tweaked philbot so we no longer see "somebody joined #sourcefu" in the logs. I don't see a lot of value in it

| Channels | #sourcefu index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

http://sourcefu.com