greptilian logo

IRC log for #sourcefu, 2017-06-04

http://sourcefu.com

| Channels | #sourcefu index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
12:55 pdurbin Hmm, "sourcefu is a software development community" doesn't say much. Maybe I need to emphasize the "fu" part, which I added back when we talked about it at http://irclog.greptilian.com/sourcefu/2015-08-16
12:55 pdurbin Or register a different domain, come up with a better name.
13:22 pdurbin tumdedum: which of these topics (or others) are you most interested in? https://github.com/sourcefu/sourcefu.github.com/blob/master/_includes/topics.yaml
13:30 pdurbin aditsu bear codex dotplus prologic semiosis sivoais_ tumdedum I just threw together an image at http://i.imgur.com/Z9aSjXM.png where I tried to jot down what I think you're most interested in. Please let me know what you know think.
13:31 pdurbin And thanks for hanging out in #sourcefu with me. :)
13:38 aditsu pdurbin: from the ones you listed, I'm also interested in SQL and Python, and to a lesser degree in some of the other ones
13:41 aditsu you could add Mercurial, algorithms and competitive programming..
13:46 aditsu btw, I wrote my most complex Haskell program yet for the google code jam this year :p
13:47 aditsu (it's also probably the second program :p)
13:56 pdurbin aditsu: thanks, I know you're into all sort of languages. I'll fix it up and push it into git once others have weighed in.
13:59 aditsu it's mainly about competing for the "multiple languages" statistics in the gcj :) I don't really have time to learn and work with that many languages normally
13:59 pdurbin yeah, makes sense
13:59 pdurbin for now I updated http://tmp.greptilian.com/tmp/sourcefu
14:01 aditsu nice
14:03 aditsu well, if you added those 3 new things, you might as well link me to them
14:32 pdurbin done
15:13 aditsu hmm, looks like you're not interested in python anymore :p
15:14 pdurbin heh, no, I am, I'm just trying to reduce the clutter visually :)
15:15 pdurbin I'll be even more interested in Python once everyone is on Python 3.
15:22 pdurbin aditsu: I forget if I asked you if you'd be interested in adding yourself to https://github.com/sourcefu/sourcefu.github.com/tree/master/members
15:23 aditsu what for?
15:25 pdurbin because I like what folks have written at http://sourcefu.com/members/bear and http://sourcefu.com/members/prologic and http://sourcefu.com/members/sivoais
15:25 pdurbin no pressure, it's just an open invitation to everyone
18:21 bear for me - link to Bash and Docker
18:25 pdurbin bear: fixed, thanks (in tmp)
18:25 bear +1
18:31 prologic anyone around?
18:31 prologic got a data syncronization / distirbuted systems problem to solve
18:38 bear oh my
18:39 bear that's like oil and water - you can find something that works for a little while, but it always seperates in the end
18:40 bear is the data required to be sync when receiving or sync as in same copy on multiple nodes
18:41 bear i.e. is it "message pushed to all receivers such that they all act at the same time" or "we have 20 items and we want eventually to have all be on 100 nodes"
18:45 prologic Okay this is hopefully much simpler:
18:45 prologic https://github.com/circuits/irclogger is the thing I'm talking about
18:45 prologic A distributed (well trying to be) irc logger bot with web interface
18:46 prologic it writes raw logs to files on disk in a format that irclog2html (3rd party) can scrape into static html files
18:46 bear so it's a stream of log items from a single source to 0-N receivers
18:46 prologic I'm setting up a replacement system for this (old one has issues) and now have a high-availability block storage volume attached to the new VM
18:47 prologic which I assume can also be attached to multiple VMs simulteaneously
18:47 prologic well if I run two instances of the bot with identical configuration
18:47 prologic it'll be N-streams that need to converge I guess
18:48 bear yea, that's a merge issue - some place you have to have a choke point that can know if a message has been seen and discard it
18:48 prologic yeah
18:48 bear your biggest issue there is time sync
18:48 prologic can two processes append to a file at the same time without corruption?
18:48 prologic I'm not sure tbh
18:48 bear I have never thought to even try
18:48 prologic me neither
18:49 prologic alternatively N-streams of files to the shared volume
18:49 prologic and a separate process that merges the streams
18:49 bear I would push to a queue and then have a small in memory cache of prior items for a single process to merge
18:49 prologic hmm
18:50 bear using a queue also allows you to get a real-time(ish) stream for your web side to receive updates
18:50 prologic is what I'm trying to do considered a quorum problem?
18:50 prologic in which case you'd need a minimum of 3 instances to agree on the state of the stream
18:50 prologic I don't wanna make this too hard :D
18:50 prologic irclogger is only ~350 loc
18:51 bear yes, you are in the C part of CAP
18:51 prologic yeah sadly the web side is rather nasty (quick 'n dirsty)
18:51 bear I would do redis or rmq to push into
18:51 prologic http://irclogs.shortcircuit.net.au/
18:51 bear and then a small reader that groks the hash of the line
18:52 prologic and if we've seen that line before skip it?
18:53 bear yep
18:54 prologic kinda funny though
18:54 prologic Logger (referring to the source) component already is a queue internally
18:54 prologic the way circuits itself is designed as a framework supports events and distributed messaging
18:55 bear yea, you could use one of the circuits internals as the queue and writer part
18:55 bear no need for redis or rmq if you don't mind having everything be in memory
18:55 bear (which, having just pressed enter, I realize that is exactly what redis and rmq are for the most part)
18:56 prologic :D
18:56 prologic only problem is coordination
18:56 prologic but I realize looking at the code again
18:56 prologic we also write to raw files opened in append mode
18:57 prologic so this may be simpler than I thought as long as I can find out / or verify myself that N or more writers can safely append to the same file
18:57 prologic which if I remember my UNIX(ish) I think is the case
18:57 bear I know that they can both write, it's just a matter of the assumption that ordering will be valid
18:58 bear if you don't mind the occassional out of order line ... go for it
19:00 prologic FOund something relevant: https://stackoverflow.com/questions/1154446/is-file-append-atomic-in-unix
19:00 prologic I think the ordering doesn't matter tbh
19:01 prologic because that can be sorted on the processing side
19:01 bear yea, ordering (aka file tearing) only appears for READS
19:01 prologic > However concurrent reads to atomic appends may see torn writes depending on OS, filing system, and what flags you opened the file with - the increment of the maximum file extent is atomic, but the visibility of the writes with respect to reads may or may not be atomic. Here is a quick summary by flags, OS and filing system:
19:01 bear O_APPEND is atomic for lines smaller than the buffer
19:02 prologic actually I think if you read the 3rd answer thats wrong
19:02 prologic writes with O_APPEND seem to be atomic regardless of buffer size
19:02 bear oh, that's from memory from back before ext4 was even a thought
19:02 prologic its the visibility from reads that isn't depending
19:02 bear cool
19:03 bear then yea, you could use a temp file as your queue
19:03 prologic so the processing may have to take that into account
19:03 prologic I already do :)
19:03 prologic irclogger -> raw log files per day (rotated) -> html files (processed by irclog2html)
19:03 bear if you order based on the irc server's timestamp and not the processors - you should be ok
19:05 prologic > So, to answer the OP's question, O_APPEND writes will not interfere with one another, but reads concurrent to O_APPEND writes will probably see torn writes unless O_DIRECT is on, whereupon your O_APPEND writes would need to be a sector size multiple.
19:05 prologic There.
19:06 prologic But the only thing left now is "how are torn writes seen from readers"
19:06 prologic maybe this manifests from the reader as incomplete lines?
19:06 bear yea, I wonder how often you would get partial line reads
19:06 prologic that's the only thing I can think of
19:06 prologic yeah
19:06 prologic so I'll just guard against a missing \r\n on the line
19:07 bear I think we are in the realm of "would have to be a lot of lines written *very* quickly" territory
19:07 bear yea, if you could add a start-of-line marker and then check for > 1 per newline...
19:09 prologic yeap
19:21 prologic So this may all be thwarted by DO itself
19:21 prologic testing if a volume can be attached to two droplets
19:24 bear I doubt if they allow that
19:25 prologic yeap confirmed
19:25 prologic its block storage
19:25 prologic I can't see how
19:25 prologic how would you orchestrate the scsi commands from multiple sources :D
19:27 prologic going to have to rethink this :D
19:28 bear :)
19:28 bear this is where I spin up redis or rmq
19:36 prologic yeah
19:36 prologic its like I said
19:36 prologic I need a quorum
19:37 prologic because shared block storage / file systems are "too hard" :P
23:47 pdurbin prologic: maybe you should write to a database instead

| Channels | #sourcefu index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

http://sourcefu.com