IRC log for #sourcefu, 2016-11-20

http://sourcefu.com

All times shown according to UTC.

Time	Nick	Message
10:41		dotplus joined #sourcefu
10:41		dotplus joined #sourcefu
10:58		dotplus joined #sourcefu
11:02	dotplus	bear: is sleekxmpp still considered to be a/the Right Way to bulid xmpp clients/bots?
11:11	dotplus	in Python, that is:)
11:22		copyit joined #sourcefu
12:18		copyit joined #sourcefu
12:26		copyit joined #sourcefu
12:33		copyit joined #sourcefu
12:43		copyit joined #sourcefu
14:12	pdurbin	Huh, apparently https://github.com/pdurbin.atom is the new https://github.com/pdurbin?tab=activity (whic doesn't work anymore). See also http://stackoverflow.com/questions/9128049/view-entire-activities-of-an-user-in-github#comment66602673_9128958
14:19	pdurbin	The problem is that that Atom link doesn't render in Chrome. You just see the XML. It does render find in Firefox at least.
14:20	pdurbin	But I don't particularly want to install Firefox on my Android phone. I'm fine with the default browser (Chrome).
14:21	pdurbin	huh, "It shows the XML code, unformatted" 84 - RSS or Atom support needed - chromium - Monorail - https://bugs.chromium.org/p/chromium/issues/detail?id=84
14:23	pdurbin	https://bugs.chromium.org/p/chromium/issues/detail?id=84#c149 says the bug can be fixed by installing https://chrome.google.com/extensions/detail/nlbjncdgjeocebhnmkbbbdekmmmcbfjd
14:25	pdurbin	But apparently I can't install that extension on Android.
14:48	pdurbin	ooh, `.mode line` in sqlite is nice: https://www.sqlite.org/cli.html
14:59	pdurbin	To back up a bit, this is the question that was just asked: One of the questions is to find the state with the most counties in it from the census data in USA http://www.census.gov/popest/data/counties/totals/2015/files/CO-EST2015-alldata.csv
14:59	pdurbin	over at https://gitter.im/pydata/pandas?at=58318fd0a5bc784f5658023f
14:59	pdurbin	How would people in this channel get the answer?
14:59	pdurbin	aditsu bear codex dotplus prologic semiosis sivoais tumdedum westmaas ^^
15:00	aditsu	huh?
15:00	pdurbin	Lately I've been thinking I should learn SQL better: http://irclog.greptilian.com/sourcefu/2016-11-14
15:01	pdurbin	aditsu: how would you find the state with the most counties in it based on that csv file?
15:01	aditsu	let me see the file..
15:02	aditsu	I got nxdomain for www.census.gov o_O
15:02	pdurbin	aditsu: can you grab it from http://tmp.greptilian.com/tmp/data/CO-EST2015-alldata.csv ?
15:03	aditsu	(but dig says servfail)
15:03	aditsu	yes, that one worked
15:04	pdurbin	cool. lemme know your approach
15:04	aditsu	any language requirement? or algorithm question?
15:04	pdurbin	nope. just get the answer
15:04	pdurbin	which state and how many counties
15:05	aditsu	they're grouped by state, so I can just check sequentially
15:06	pdurbin	but with what tool? I'm using sqlite
15:06	aditsu	brb
15:08	aditsu	so, my first idea is libreoffice calc, but I don't know the functions well enough; 2nd idea is CJam :)
15:08	aditsu	I can probably do it in a couple of minutes
15:10	aditsu	[[1 "STNAME"] [68 "Alabama"] [30 "Alaska"] [16 "Arizona"] [76 "Arkansas"] [59 "California"] [65 "Colorado"] [9 "Connecticut"] [4 "Delaware"] [2 "District of Columbia"] [68 "Florida"] [160 "Georgia"] [6 "Hawaii"] [45 "Idaho"] [103 "Illinois"] [93 "Indiana"] [100 "Iowa"] [106 "Kansas"] [121 "Kentucky"] [65 "Louisiana"] [17 "Maine"] [25 "Maryland"] [15 "Massachusetts"] [84 "Michigan"] [88...
15:10	aditsu	..."Minnesota"] [83 "Mississippi"] [116 "Missouri"] [57 "Montana"] [94 "Nebraska"] [18 "Nevada"] [11 "New Hampshire"] [22 "New Jersey"] [34 "New Mexico"] [63 "New York"] [101 "North Carolina"] [54 "North Dakota"] [89 "Ohio"] [78 "Oklahoma"] [37 "Oregon"] [68 "Pennsylvania"] [6 "Rhode Island"] [47 "South Carolina"] [67 "South Dakota"] [96 "Tennessee"] [255 "Texas"] [30 "Utah"] [15 "Vermont"]...
15:10	aditsu	...[134 "Virginia"] [40 "Washington"] [56 "West Virginia"] [73 "Wisconsin"] [24 "Wyoming"] [1 ""]]
15:11	aditsu	oops, a bit too long :p
15:11	pdurbin	maybe I should start a "katas" area for #sourcefu like I did for #crimsonfu: https://github.com/crimsonfu/code/tree/master/katas
15:11	aditsu	result: [255 "Texas"]
15:11	aditsu	full code: qN/',f/5f=e`{0=}$W=p
15:12	pdurbin	aditsu: cool, then you just need to sort and pick the biggest
15:12	aditsu	yeah, just did
15:12	pdurbin	ah, a bit of lag
15:12	aditsu	I can probably make it a bit shorter
15:12	pdurbin	my sqlite solution: select stname,count(ctyname) from census group by stname order by count(ctyname) desc limit 1;
15:13	aditsu	that works
15:13	pdurbin	aditsu: is that CJam?
15:13	aditsu	yes
15:13	pdurbin	it's a little cryptic :)
15:14	aditsu	yeah.. it's a golfing language so not very readable, but very concise :)
15:14	pdurbin	maybe sivoais can come up with something shorter in Perl :)
15:14	pdurbin	I'm not very into golfing myself.
15:14	aditsu	updated a bit: qN%',f/5f=e`$W=p
15:15	aditsu	I doubt perl can get shorter
15:15	pdurbin	me neither
15:15	pdurbin	aditsu: where the bit where you read in the file?
15:15	aditsu	"q" reads the whole file
15:15	aditsu	as a string
15:16	pdurbin	ok
15:16	aditsu	the key part is "e`" which does RLE compression
15:20	aditsu	can sqlite query a csv file directly?
15:20	aditsu	if not, there's http://harelba.github.io/q/
15:21	pdurbin	aditsu: yeah, you just do `.import CO-EST2015-alldata.csv census` or whatever
15:21	aditsu	ah, there's an import step
15:21	pdurbin	oh, `.mode csv` first. see "CSV Import" at https://www.sqlite.org/cli.html
15:25	pdurbin	aditsu: have you used `q` and if so, do you like it?
15:25	aditsu	I don't remember if I actually tried it :p
15:26	pdurbin	ok, it's a neat idea
15:26	aditsu	I have a q command on this machine, but it's a different thing
15:26	pdurbin	I don't mind the import step. I like that sqlite is everywhere.
15:26	pdurbin	maybe I'll try to figure out how to get the answer in R
15:27	aditsu	I haven't used sqlite much
15:29	pdurbin	aditsu: I'm surprised you haven't mentioned Depeche yet.
15:30	aditsu	haha, I was thinking about it :) I have some code that enables it to use csv files directly, but it needs some more work
15:31	aditsu	also I haven't implemented stuff like group by (with count)
15:32	aditsu	of course, if you load the csv into a database, you can use that, but then you can do it manually in sql without java code
15:33	pdurbin	aditsu: our answer is wrong!
15:33	aditsu	wat?
15:34	aditsu	is it 254?
15:35	aditsu	the answer is correct, the input is wrong :D
15:36	pdurbin	time to fix our code
15:36	aditsu	the code is perfectly fine based on the problem description
15:37	pdurbin	nope
15:37	pdurbin	leave it to my wife to actually look at the data :)
15:37	pdurbin	aditsu: go fix your code. I'll try to fix mine.
15:38	aditsu	yes it is fine, the data is wrong
15:38	aditsu	I also noticed that but I initially thought they have counties with the same name
15:41	aditsu	so just need to subtract 1 to compensate for the wrong input
15:43	aditsu	On two occasions I have been asked, — "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?"
15:43	aditsu	I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
16:04	pdurbin	ok, this fixes it: select stname,count(ctyname) from census where county != '000' group by stname order by count(ctyname) desc limit 1;
16:04	aditsu	you could have just added a "-1"
16:06	pdurbin	where?
16:06	aditsu	count(ctyname)-1
16:06	aditsu	in the select part
16:08	pdurbin	ah. thanks. yes, this works: select stname,count(ctyname)-1 from census group by stname order by count(ctyname) desc limit 1;
16:26		aditsu_phone joined #sourcefu
16:56	pdurbin	ah, https://www.census.gov/popest/data/counties/totals/2015/files/CO-EST2015-alldata.pdf is awfully helpful
16:56	pdurbin	via https://www.census.gov/popest/data/counties/totals/2015/CO-EST2015-alldata.html
16:56	pdurbin	(the first hit for the csv file name)
17:04	pdurbin	this is interesting: select stname,division from census where sumlev = '040' order by division;
17:05	pdurbin	Ohio (where I grew up) is considered "East North Central". I thought I was from the Midwest. :)
17:08		AndChat\|264089 joined #sourcefu
17:13	pdurbin	I've forgotton all of my R, sadly.
17:14	pdurbin	I should beef up my notes at http://wiki.greptilian.com/r
17:21	pdurbin	ah, according to https://gitter.im/pydata/pandas?at=5831db4b3418b2e57f2ba695 and http://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html a solution in pandas is this: df.groupby('STNAME').size().idxmax()
17:25	pdurbin	I bet it doesn't account for SUMLEV though.
17:55	pdurbin	Yeah, the revised pandas answer: df[df['SUMLEV']==50].groupby('STNAME').size().idxmax()
18:16	pdurbin	oh, I am still from the midwest: select stname,region from census where sumlev = '040' and region = 2;
18:16	pdurbin	I didn't notice "region". :)

http://sourcefu.com