IRC log for #sourcefu, 2017-06-04

http://sourcefu.com

All times shown according to UTC.

Time	Nick	Message
12:55	pdurbin	Hmm, "sourcefu is a software development community" doesn't say much. Maybe I need to emphasize the "fu" part, which I added back when we talked about it at http://irclog.greptilian.com/sourcefu/2015-08-16
12:55	pdurbin	Or register a different domain, come up with a better name.
13:22	pdurbin	tumdedum: which of these topics (or others) are you most interested in? https://github.com/sourcefu/sourcefu.github.com/blob/master/_includes/topics.yaml
13:30	pdurbin	aditsu bear codex dotplus prologic semiosis sivoais_ tumdedum I just threw together an image at http://i.imgur.com/Z9aSjXM.png where I tried to jot down what I think you're most interested in. Please let me know what you know think.
13:31	pdurbin	And thanks for hanging out in #sourcefu with me. :)
13:38	aditsu	pdurbin: from the ones you listed, I'm also interested in SQL and Python, and to a lesser degree in some of the other ones
13:41	aditsu	you could add Mercurial, algorithms and competitive programming..
13:46	aditsu	btw, I wrote my most complex Haskell program yet for the google code jam this year :p
13:47	aditsu	(it's also probably the second program :p)
13:56	pdurbin	aditsu: thanks, I know you're into all sort of languages. I'll fix it up and push it into git once others have weighed in.
13:59	aditsu	it's mainly about competing for the "multiple languages" statistics in the gcj :) I don't really have time to learn and work with that many languages normally
13:59	pdurbin	yeah, makes sense
13:59	pdurbin	for now I updated http://tmp.greptilian.com/tmp/sourcefu
14:01	aditsu	nice
14:03	aditsu	well, if you added those 3 new things, you might as well link me to them
14:32	pdurbin	done
15:13	aditsu	hmm, looks like you're not interested in python anymore :p
15:14	pdurbin	heh, no, I am, I'm just trying to reduce the clutter visually :)
15:15	pdurbin	I'll be even more interested in Python once everyone is on Python 3.
15:22	pdurbin	aditsu: I forget if I asked you if you'd be interested in adding yourself to https://github.com/sourcefu/sourcefu.github.com/tree/master/members
15:23	aditsu	what for?
15:25	pdurbin	because I like what folks have written at http://sourcefu.com/members/bear and http://sourcefu.com/members/prologic and http://sourcefu.com/members/sivoais
15:25	pdurbin	no pressure, it's just an open invitation to everyone
18:21	bear	for me - link to Bash and Docker
18:25	pdurbin	bear: fixed, thanks (in tmp)
18:25	bear	+1
18:31	prologic	anyone around?
18:31	prologic	got a data syncronization / distirbuted systems problem to solve
18:38	bear	oh my
18:39	bear	that's like oil and water - you can find something that works for a little while, but it always seperates in the end
18:40	bear	is the data required to be sync when receiving or sync as in same copy on multiple nodes
18:41	bear	i.e. is it "message pushed to all receivers such that they all act at the same time" or "we have 20 items and we want eventually to have all be on 100 nodes"
18:45	prologic	Okay this is hopefully much simpler:
18:45	prologic	https://github.com/circuits/irclogger is the thing I'm talking about
18:45	prologic	A distributed (well trying to be) irc logger bot with web interface
18:46	prologic	it writes raw logs to files on disk in a format that irclog2html (3rd party) can scrape into static html files
18:46	bear	so it's a stream of log items from a single source to 0-N receivers
18:46	prologic	I'm setting up a replacement system for this (old one has issues) and now have a high-availability block storage volume attached to the new VM
18:47	prologic	which I assume can also be attached to multiple VMs simulteaneously
18:47	prologic	well if I run two instances of the bot with identical configuration
18:47	prologic	it'll be N-streams that need to converge I guess
18:48	bear	yea, that's a merge issue - some place you have to have a choke point that can know if a message has been seen and discard it
18:48	prologic	yeah
18:48	bear	your biggest issue there is time sync
18:48	prologic	can two processes append to a file at the same time without corruption?
18:48	prologic	I'm not sure tbh
18:48	bear	I have never thought to even try
18:48	prologic	me neither
18:49	prologic	alternatively N-streams of files to the shared volume
18:49	prologic	and a separate process that merges the streams
18:49	bear	I would push to a queue and then have a small in memory cache of prior items for a single process to merge
18:49	prologic	hmm
18:50	bear	using a queue also allows you to get a real-time(ish) stream for your web side to receive updates
18:50	prologic	is what I'm trying to do considered a quorum problem?
18:50	prologic	in which case you'd need a minimum of 3 instances to agree on the state of the stream
18:50	prologic	I don't wanna make this too hard :D
18:50	prologic	irclogger is only ~350 loc
18:51	bear	yes, you are in the C part of CAP
18:51	prologic	yeah sadly the web side is rather nasty (quick 'n dirsty)
18:51	bear	I would do redis or rmq to push into
18:51	prologic	http://irclogs.shortcircuit.net.au/
18:51	bear	and then a small reader that groks the hash of the line
18:52	prologic	and if we've seen that line before skip it?
18:53	bear	yep
18:54	prologic	kinda funny though
18:54	prologic	Logger (referring to the source) component already is a queue internally
18:54	prologic	the way circuits itself is designed as a framework supports events and distributed messaging
18:55	bear	yea, you could use one of the circuits internals as the queue and writer part
18:55	bear	no need for redis or rmq if you don't mind having everything be in memory
18:55	bear	(which, having just pressed enter, I realize that is exactly what redis and rmq are for the most part)
18:56	prologic	:D
18:56	prologic	only problem is coordination
18:56	prologic	but I realize looking at the code again
18:56	prologic	we also write to raw files opened in append mode
18:57	prologic	so this may be simpler than I thought as long as I can find out / or verify myself that N or more writers can safely append to the same file
18:57	prologic	which if I remember my UNIX(ish) I think is the case
18:57	bear	I know that they can both write, it's just a matter of the assumption that ordering will be valid
18:58	bear	if you don't mind the occassional out of order line ... go for it
19:00	prologic	FOund something relevant: https://stackoverflow.com/questions/1154446/is-file-append-atomic-in-unix
19:00	prologic	I think the ordering doesn't matter tbh
19:01	prologic	because that can be sorted on the processing side
19:01	bear	yea, ordering (aka file tearing) only appears for READS
19:01	prologic	> However concurrent reads to atomic appends may see torn writes depending on OS, filing system, and what flags you opened the file with - the increment of the maximum file extent is atomic, but the visibility of the writes with respect to reads may or may not be atomic. Here is a quick summary by flags, OS and filing system:
19:01	bear	O_APPEND is atomic for lines smaller than the buffer
19:02	prologic	actually I think if you read the 3rd answer thats wrong
19:02	prologic	writes with O_APPEND seem to be atomic regardless of buffer size
19:02	bear	oh, that's from memory from back before ext4 was even a thought
19:02	prologic	its the visibility from reads that isn't depending
19:02	bear	cool
19:03	bear	then yea, you could use a temp file as your queue
19:03	prologic	so the processing may have to take that into account
19:03	prologic	I already do :)
19:03	prologic	irclogger -> raw log files per day (rotated) -> html files (processed by irclog2html)
19:03	bear	if you order based on the irc server's timestamp and not the processors - you should be ok
19:05	prologic	> So, to answer the OP's question, O_APPEND writes will not interfere with one another, but reads concurrent to O_APPEND writes will probably see torn writes unless O_DIRECT is on, whereupon your O_APPEND writes would need to be a sector size multiple.
19:05	prologic	There.
19:06	prologic	But the only thing left now is "how are torn writes seen from readers"
19:06	prologic	maybe this manifests from the reader as incomplete lines?
19:06	bear	yea, I wonder how often you would get partial line reads
19:06	prologic	that's the only thing I can think of
19:06	prologic	yeah
19:06	prologic	so I'll just guard against a missing \r\n on the line
19:07	bear	I think we are in the realm of "would have to be a lot of lines written very quickly" territory
19:07	bear	yea, if you could add a start-of-line marker and then check for > 1 per newline...
19:09	prologic	yeap
19:21	prologic	So this may all be thwarted by DO itself
19:21	prologic	testing if a volume can be attached to two droplets
19:24	bear	I doubt if they allow that
19:25	prologic	yeap confirmed
19:25	prologic	its block storage
19:25	prologic	I can't see how
19:25	prologic	how would you orchestrate the scsi commands from multiple sources :D
19:27	prologic	going to have to rethink this :D
19:28	bear	:)
19:28	bear	this is where I spin up redis or rmq
19:36	prologic	yeah
19:36	prologic	its like I said
19:36	prologic	I need a quorum
19:37	prologic	because shared block storage / file systems are "too hard" :P
23:47	pdurbin	prologic: maybe you should write to a database instead

http://sourcefu.com