Time |
S |
Nick |
Message |
12:55 |
|
pdurbin |
Hmm, "sourcefu is a software development community" doesn't say much. Maybe I need to emphasize the "fu" part, which I added back when we talked about it at http://irclog.greptilian.com/sourcefu/2015-08-16 |
12:55 |
|
pdurbin |
Or register a different domain, come up with a better name. |
13:22 |
|
pdurbin |
tumdedum: which of these topics (or others) are you most interested in? https://github.com/sourcefu/sourcefu.github.com/blob/master/_includes/topics.yaml |
13:30 |
|
pdurbin |
aditsu bear codex dotplus prologic semiosis sivoais_ tumdedum I just threw together an image at http://i.imgur.com/Z9aSjXM.png where I tried to jot down what I think you're most interested in. Please let me know what you know think. |
13:31 |
|
pdurbin |
And thanks for hanging out in #sourcefu with me. :) |
13:38 |
|
aditsu |
pdurbin: from the ones you listed, I'm also interested in SQL and Python, and to a lesser degree in some of the other ones |
13:41 |
|
aditsu |
you could add Mercurial, algorithms and competitive programming.. |
13:46 |
|
aditsu |
btw, I wrote my most complex Haskell program yet for the google code jam this year :p |
13:47 |
|
aditsu |
(it's also probably the second program :p) |
13:56 |
|
pdurbin |
aditsu: thanks, I know you're into all sort of languages. I'll fix it up and push it into git once others have weighed in. |
13:59 |
|
aditsu |
it's mainly about competing for the "multiple languages" statistics in the gcj :) I don't really have time to learn and work with that many languages normally |
13:59 |
|
pdurbin |
yeah, makes sense |
13:59 |
|
pdurbin |
for now I updated http://tmp.greptilian.com/tmp/sourcefu |
14:01 |
|
aditsu |
nice |
14:03 |
|
aditsu |
well, if you added those 3 new things, you might as well link me to them |
14:32 |
|
pdurbin |
done |
15:13 |
|
aditsu |
hmm, looks like you're not interested in python anymore :p |
15:14 |
|
pdurbin |
heh, no, I am, I'm just trying to reduce the clutter visually :) |
15:15 |
|
pdurbin |
I'll be even more interested in Python once everyone is on Python 3. |
15:22 |
|
pdurbin |
aditsu: I forget if I asked you if you'd be interested in adding yourself to https://github.com/sourcefu/sourcefu.github.com/tree/master/members |
15:23 |
|
aditsu |
what for? |
15:25 |
|
pdurbin |
because I like what folks have written at http://sourcefu.com/members/bear and http://sourcefu.com/members/prologic and http://sourcefu.com/members/sivoais |
15:25 |
|
pdurbin |
no pressure, it's just an open invitation to everyone |
18:21 |
|
bear |
for me - link to Bash and Docker |
18:25 |
|
pdurbin |
bear: fixed, thanks (in tmp) |
18:25 |
|
bear |
+1 |
18:31 |
|
prologic |
anyone around? |
18:31 |
|
prologic |
got a data syncronization / distirbuted systems problem to solve |
18:38 |
|
bear |
oh my |
18:39 |
|
bear |
that's like oil and water - you can find something that works for a little while, but it always seperates in the end |
18:40 |
|
bear |
is the data required to be sync when receiving or sync as in same copy on multiple nodes |
18:41 |
|
bear |
i.e. is it "message pushed to all receivers such that they all act at the same time" or "we have 20 items and we want eventually to have all be on 100 nodes" |
18:45 |
|
prologic |
Okay this is hopefully much simpler: |
18:45 |
|
prologic |
https://github.com/circuits/irclogger is the thing I'm talking about |
18:45 |
|
prologic |
A distributed (well trying to be) irc logger bot with web interface |
18:46 |
|
prologic |
it writes raw logs to files on disk in a format that irclog2html (3rd party) can scrape into static html files |
18:46 |
|
bear |
so it's a stream of log items from a single source to 0-N receivers |
18:46 |
|
prologic |
I'm setting up a replacement system for this (old one has issues) and now have a high-availability block storage volume attached to the new VM |
18:47 |
|
prologic |
which I assume can also be attached to multiple VMs simulteaneously |
18:47 |
|
prologic |
well if I run two instances of the bot with identical configuration |
18:47 |
|
prologic |
it'll be N-streams that need to converge I guess |
18:48 |
|
bear |
yea, that's a merge issue - some place you have to have a choke point that can know if a message has been seen and discard it |
18:48 |
|
prologic |
yeah |
18:48 |
|
bear |
your biggest issue there is time sync |
18:48 |
|
prologic |
can two processes append to a file at the same time without corruption? |
18:48 |
|
prologic |
I'm not sure tbh |
18:48 |
|
bear |
I have never thought to even try |
18:48 |
|
prologic |
me neither |
18:49 |
|
prologic |
alternatively N-streams of files to the shared volume |
18:49 |
|
prologic |
and a separate process that merges the streams |
18:49 |
|
bear |
I would push to a queue and then have a small in memory cache of prior items for a single process to merge |
18:49 |
|
prologic |
hmm |
18:50 |
|
bear |
using a queue also allows you to get a real-time(ish) stream for your web side to receive updates |
18:50 |
|
prologic |
is what I'm trying to do considered a quorum problem? |
18:50 |
|
prologic |
in which case you'd need a minimum of 3 instances to agree on the state of the stream |
18:50 |
|
prologic |
I don't wanna make this too hard :D |
18:50 |
|
prologic |
irclogger is only ~350 loc |
18:51 |
|
bear |
yes, you are in the C part of CAP |
18:51 |
|
prologic |
yeah sadly the web side is rather nasty (quick 'n dirsty) |
18:51 |
|
bear |
I would do redis or rmq to push into |
18:51 |
|
prologic |
http://irclogs.shortcircuit.net.au/ |
18:51 |
|
bear |
and then a small reader that groks the hash of the line |
18:52 |
|
prologic |
and if we've seen that line before skip it? |
18:53 |
|
bear |
yep |
18:54 |
|
prologic |
kinda funny though |
18:54 |
|
prologic |
Logger (referring to the source) component already is a queue internally |
18:54 |
|
prologic |
the way circuits itself is designed as a framework supports events and distributed messaging |
18:55 |
|
bear |
yea, you could use one of the circuits internals as the queue and writer part |
18:55 |
|
bear |
no need for redis or rmq if you don't mind having everything be in memory |
18:55 |
|
bear |
(which, having just pressed enter, I realize that is exactly what redis and rmq are for the most part) |
18:56 |
|
prologic |
:D |
18:56 |
|
prologic |
only problem is coordination |
18:56 |
|
prologic |
but I realize looking at the code again |
18:56 |
|
prologic |
we also write to raw files opened in append mode |
18:57 |
|
prologic |
so this may be simpler than I thought as long as I can find out / or verify myself that N or more writers can safely append to the same file |
18:57 |
|
prologic |
which if I remember my UNIX(ish) I think is the case |
18:57 |
|
bear |
I know that they can both write, it's just a matter of the assumption that ordering will be valid |
18:58 |
|
bear |
if you don't mind the occassional out of order line ... go for it |
19:00 |
|
prologic |
FOund something relevant: https://stackoverflow.com/questions/1154446/is-file-append-atomic-in-unix |
19:00 |
|
prologic |
I think the ordering doesn't matter tbh |
19:01 |
|
prologic |
because that can be sorted on the processing side |
19:01 |
|
bear |
yea, ordering (aka file tearing) only appears for READS |
19:01 |
|
prologic |
> However concurrent reads to atomic appends may see torn writes depending on OS, filing system, and what flags you opened the file with - the increment of the maximum file extent is atomic, but the visibility of the writes with respect to reads may or may not be atomic. Here is a quick summary by flags, OS and filing system: |
19:01 |
|
bear |
O_APPEND is atomic for lines smaller than the buffer |
19:02 |
|
prologic |
actually I think if you read the 3rd answer thats wrong |
19:02 |
|
prologic |
writes with O_APPEND seem to be atomic regardless of buffer size |
19:02 |
|
bear |
oh, that's from memory from back before ext4 was even a thought |
19:02 |
|
prologic |
its the visibility from reads that isn't depending |
19:02 |
|
bear |
cool |
19:03 |
|
bear |
then yea, you could use a temp file as your queue |
19:03 |
|
prologic |
so the processing may have to take that into account |
19:03 |
|
prologic |
I already do :) |
19:03 |
|
prologic |
irclogger -> raw log files per day (rotated) -> html files (processed by irclog2html) |
19:03 |
|
bear |
if you order based on the irc server's timestamp and not the processors - you should be ok |
19:05 |
|
prologic |
> So, to answer the OP's question, O_APPEND writes will not interfere with one another, but reads concurrent to O_APPEND writes will probably see torn writes unless O_DIRECT is on, whereupon your O_APPEND writes would need to be a sector size multiple. |
19:05 |
|
prologic |
There. |
19:06 |
|
prologic |
But the only thing left now is "how are torn writes seen from readers" |
19:06 |
|
prologic |
maybe this manifests from the reader as incomplete lines? |
19:06 |
|
bear |
yea, I wonder how often you would get partial line reads |
19:06 |
|
prologic |
that's the only thing I can think of |
19:06 |
|
prologic |
yeah |
19:06 |
|
prologic |
so I'll just guard against a missing \r\n on the line |
19:07 |
|
bear |
I think we are in the realm of "would have to be a lot of lines written *very* quickly" territory |
19:07 |
|
bear |
yea, if you could add a start-of-line marker and then check for > 1 per newline... |
19:09 |
|
prologic |
yeap |
19:21 |
|
prologic |
So this may all be thwarted by DO itself |
19:21 |
|
prologic |
testing if a volume can be attached to two droplets |
19:24 |
|
bear |
I doubt if they allow that |
19:25 |
|
prologic |
yeap confirmed |
19:25 |
|
prologic |
its block storage |
19:25 |
|
prologic |
I can't see how |
19:25 |
|
prologic |
how would you orchestrate the scsi commands from multiple sources :D |
19:27 |
|
prologic |
going to have to rethink this :D |
19:28 |
|
bear |
:) |
19:28 |
|
bear |
this is where I spin up redis or rmq |
19:36 |
|
prologic |
yeah |
19:36 |
|
prologic |
its like I said |
19:36 |
|
prologic |
I need a quorum |
19:37 |
|
prologic |
because shared block storage / file systems are "too hard" :P |
23:47 |
|
pdurbin |
prologic: maybe you should write to a database instead |