14 posts tagged “oscon”
This picture was randomly taken of me at OSCON 2006. I was standing in the front area waiting for everybody else to get out of the talks they were in so we could go to lunch.
For being a random photo, I think it turned out really well. I feel that it captured my generally thoughtful nature, as it seems I'm distantly thinking about something. I love the coloring, I don't look white as a sheet for once. My hair is also freshly cut, which I think adds to the picture.
The background is just fuzzy enough to give you some information about what's going on. The sign shows 'Directory' which is legible and perfectly shows that I'm standing near the front of something. It's just... perfect, I think.
So yeah, thanks to Flickr user "skippy", even though I don't know who you are, you managed to capture perhaps my favorite picture of myself.
Now at the talk about the LJ "trifecta" - memcached, Perlbal, and
MogileFS. This talk is given by Brad Whitaker and Artur Bergman. They asked me to be here in case they get stumped with any questions.
I'm not really going to chronicle the talk here. (But in the future I should write about them, perhaps. Or just work on the docs, or maybe just work on fixing bugs/adding features, or whatever. Hi.)
Talk given by Arjen Lentz, Community Relations Manager, MYSQL AB. (Don't you love how half the time I get the name... it depends on how long they leave the first slide up. Hurrah.)
Concurrency control is the mechanism required to manage multiple users accessing the same resource (in this case, tables and rows). In general the goal is to implement this with the least possible overhead, optimized for what you're trying to accomplish. Every approach has pros and cons. Also, this talk is not specific to MySQL at all.
Without any rules, obviously you're going to run into rules. Connection A and B do a select, then an update, but they're a little behind on B, so someone's update gets lost. Whatever concurrency control does, it has to prevent this sort of situation.
You must also avoid "uncommitted dependencies." When one person does something, then another does something using the results of the first person, then the first person issues a rollback. This is similar to something called "inconsistent analysis", which is someone who's working on rows that have changed, and then you end up with bad data.
One big concept here is locking granularity. You want to lock things, but how low level do you want to go? SQLite does it on the database level, MySQL/MyISAM and MySQL/MEMORY do it at the tabel, BerkeleyDB and Ingres do it at the page level, and pretty much everybody else uses row level locking.
Once you've got granularity, you have types of locks. Shared read locks allow other people to read rows, as long as they don't change them. Update locks mean you intend to update the row. Nobody else can get an update lock while you've got one. Exclusive locks mean it's just you. All you!
Then, after locking is implemented, you want transactions. It's certainly not always necessary, but it's definitely something that needs supporting. It allows multiple commands to execute within a safe area where they can't get run over. This is using the ACID rules...
Atomicity - one or more operations are a single unit of work, and indivisible. Either all or nothing.
Consistency - the database must always move from one consistent state to another. Inconsistencies may exist during a transaction, but must not after you commit or rollback.
Isolation - changes in flight (in an uncommitted transaction) must not be visible to other transactions.
Durability - once committed, data must persist somewhere/somehow.
(This is all stuff I know/am familiar with. The talk is a bit geared towards newer people and reminds me a lot about the database courses at school...)
There are now four isolation levels. Not all of these are implemented by every database server, but some of them are. For completeness here they are:
Read Uncommitted - doesn't solve isolation, you can see data that is coming from other transactions that are in flight. You can get bad data/dirty reads this way.
Read Committed - you only see data that is committed, and you don't see uncommitted transactions. This can cause non-repeatable reads, which is where you're reading data, someone commits a transaction, and you do the same read again, getting new data.
Repeatable Read - you only see committed transactions that were committed before you started your transaction. Except for phantoms. Which is where you select data that doesn't exist, then someone inserts the row, and you select again - now you get the row. (InnoDB prevents this by locking the gap on the rows that don't exist.)
Serializable - transactions are completely isolated from eachother. The problem is that every single select requires a shared lock. Prevents phantoms, regardless of the database you use. Can be very slow.
So, higher isolation level is more correct, but hurts concurrency. It's a huge tradeoff to make. If you don't implement every isolation level, then your database should escalate to a level higher if you don't implement one they're asking for. (Unfortunately, some databases don't do this.)
When doing locking, escalate to the next higher level if you're starting to do a lot of it. If you're starting to lock a ton of rows, lock pages. Lots of pages? Lock tables. This is not stuff that the end user decides, this is all dictated by the database. Set it up to save memory as appropriate.
There are two types of locking... optimistic, and pessimistic. The former is to only take action when necessary, the latter assumes that you're going to run into problems, so lock things all the time just to make sure we don't have problems. (At this point he's skimming through things, as we're 5 minutes to the end of the session.)
There are other concepts here, such as two-phase locking, multi-versioned concurrency control (MVCC), and deadlocks. The latter mean you have to rollback transactions.
Most database also support explicit locking - you can tell it to lock something. Not all databases support this, and it's not defined in the SQL standards. (Interesting - a community standard? Among databases? Huh.)
This talk is by Dr. Lars Thalmann regarding where MySQL replication is going.
First bit is about why and how replication works as it stands. People do it for high availibility (failover), load-balancing, and having an online backup. It's done with the standard snapshot and binlog method that everybody uses. Hurrah, nothing new here.
New features in 5.0... auto-increment works in bi-directional replication (master-master), character set and timezone replication, stored procedure and triggers and views replication, and other advanced things such as FOREIGN_KEY_CHECKS, UNIQUE_KEY_CHECKS, SQL_AUTO_IS_NULL, and SQL_MODE.
New in 5.1... row-based logging and replication, dynamic switching of log format (between statement, default, and row), MySQL Cluster replication, etc. More detail coming up.
Fairly simple, but now MySQL allows you to change the auto_increment offset and increment on a server, so that one will generate all the odd numbers, one all the event. For more complex setups, you can make them do every third number, or every hundredth, or whatever.
The bigger change is that MySQL 5.1 allows row-based replication (RBR). Usually, entire statements are written to the binlog on the master, then the slave reads the statements, and executes them. Row-based goes into the binlog straight from the storge engine. Using RBR, it is possible to replicate when you're using such features as LOAD_FILE(), UUID(), USER(), FOUND_ROWS(), and UDFs. Statement-based replication (SBR) is the default, is proven technology, and lets you audit the queries being executed. This also allows you to have different table definitions on the master and slave.
Also in 5.1 are four new binlog events. A table map event, stating, "this table id maps this table definition". Then there are three row change commands: binwrite, binupdate, and bindelete. Oh neat! Using this RBR and if you have primary keys on your tables, the binlogs are idempotent. You can run the same log against the slave multiple times, and it won't hurt it. (Useful in some recovery situations - just replay all the logs! It won't hurt! No more calculating where we left off, etc etc. I like that.)
The format of the binlog doesn't seem to have changed. Same 19 byte common header. Then the new binlog event types. Moving on. RBR also provides lots of potential optimizations. If the master had to create a temporary table and do an hour of sorting to get the data to insert, the slave won't have to do that work. It just replicates the final results of the query from the master. This also means that if you do something like "INSERT INTO t1 SELECT FROM t2" then the table t2 doesn't have to exist on the slave.
Another advantage of RBR is that now MySQL Cluster (the in-memory HA setup) support replication. You can now replicate from one cluster to another. This enables you to have your machines in lock-step, synchronously replicating the data, so every machine is guaranteed up to date. Then you can asynchronously replicate the data to another setup, such as having one cluster in the US, one in Japan, providing HA at each location, and yet keep them relatively up to date with eachother.
MySQL Cluster is looking very interesting now. The target for the replication doesn't even have to be Cluster, you can have a HA MySQL Cluster system, and replicate it to a InnoDB/MyISAM system for backup purposes or whatever else you want.
Now a huge slide with no less than 30 arrows describing how MySQL Cluster works. Yikes. Pardon me for not getting into detail - the slide's gone now so I don't really have a chance to type it up.
Another new thing, the binlog now has an "injector interface" which enables you to inject events into the binlog. He didn't provide much information, but this seems interesting for people who are writing plugins for MySQL.
A neat trick you can use is to have the master write SBRs and then set the slave to write RBRs, thereby converting them. Not sure what you'd want to do that for, but he suggested it. Heh.
Next steps is multi-source replication, online backup, conflict detection and resolution, and automatic failover. That is where MySQL is going next.
Python 3000 is the funny name for what will eventually be Python 3.0. This is planning for the way future, things that will happen much later. The next version 2.5 is coming out (hopefully) in early August.
Seems to be a big redesign. Without redesigning the entire language, he wants to fix early on design bugs that have persisted, get rid of deprecated division, and will allow some incompatible changes. Seems a bunch like how radical Perl 6 is going to be. Looking at what's best going forward.
The process is that Guido is the main driver, but is leveraging the community. Wants to avoid being the next Perl 6. (Hah!) Trying to answer questions early - when is the goal to release, continue 2.x and 3.x simultaneously?, how incompatible can it be?, do we backport features from 3.x to 2.x? Things like that.
Regarding incompatibility ... new keywords are allowed, all strings are Unicode, don't use <> as alias for !=, but not things like changing precedence, or changing the meaning of the else clause on for/while.
Mechanical translation of 2.x to 3.x will not be very possible. Some things will be pretty easy, but can't be 100% certain. The most likely approach is the creator of some pychecker that will do 80% of the work, and augment 2.x with warnings about features that will be deprecated come 3.x.
Some things that Python 3000 will not have. No programmable syntax or macros, no adding syntax for parallel iteration, no changing hash, keys to attributes, and no changing iterating over a dict to yield key-value pairs. They also should not change the look and feel of the language too much, make gratuitous and confusing changes, or add radical new features. (Can always do that later.)
For upcoming features, read PEP 3100, from this page. Also you can follow the mailing list, and he has a blog, but the URL went away too fast for me to copy.
Basic cleanup of Python 3000 will include killing classic classes, like string exceptions, or exceptions that don't derive from the root of the exception heirarchy. Division of integer by integer will return a float, remove the last few semantic differences between int and long, and by default use absolute import. Kill ancient library modules, clean stdlib up a bunch. Kill off sys.exc_type, dict.has_key(), file.xreadlines(), apply(), input(), buffer(), coerce().
Some minor changes... exec is a function again, kill `x` in favor of repr(x), change exception syntax slightly to yield a name for the exception, kill off "raise E, arg" in favor of "raise E(arg)". Also changing list generator expression from "list(f(x) for x in S)" to "[f(x) for x in S]".
Apparently lambda lives. He doesn't like it and wants it to die, but figured they should try for a better version. They've tried for a year and come up with nothing, so, it's just going to live.
Strings are getting a revamp. Instead of having str and unicode, there will now be bytes and str. The former doesn't have things like bytes.upper() and such, they're just a string of bytes. The new str class is like the old unicode one, it always supports text.
Now print is becoming a function, too. More information is in the PEP 3100 linked to above. Lots of discussion about this but he feels it's a barrier to the evolution of the language to have it as the old builtin that it was?
The talk has now run overtime by a bit, so he's wrapping it up. Interesting stuff. Oh, and they're having a Sprint which is a way to get things done... see linky.
This talk is a panel including David Recordon, who worked on LiveJournal for about two years before moving on recently to another opportunity at Verisign. I don't know everybody else on the panel, but one is Tim O'Reilly.
Sadly, I was about 15 minutes late to the talk, so I'm trying to catch up and figure out what's going on.
Conversation currently seems to be encouraging some sort of craftsman model. You have people who are new, who become apprentices to masters who are building things. Someone now arguing that's not very useful, as ... I don't know. Something about if you work for a company and leave, you don't want that sort of shake-ups?
Lots of talk about how too much money can be as bad as too little. Also you don't want to isolate people from market pressure? O'Reilly gave Larry Wall money to work only on Perl, and hence you've got Perl 6 the Performance Art, which is still not out. (For the record, he does not regret funding Larry.)
There's an entirely different electricity in this room, too. This is... somewhat adversarial. Lots of bristling going on, people seeming to attack and defend. No outright hostility, but it just feels that way in here. It's interesting and strange. (Ben thinks that it's not yet at this point, but could take this turn. Hmm.)
Now they're asking for predictions.
"Two-thirds of all acquisitions ultimately fail" was just stated. I didn't know that - she said it's an axiom, I don't remember if that word means fact, or theory. Now they're changing the question up ... what happens when founders leave, post-acquisition. Some cases are fine, some are not. "If there's a strong community, then there will be some chaos. Thankfully, Open Source has the ultimate control - you can fork the codebase." Not really useful for social network type things.
Tim O'Reilly thinks that things are always changing. It's not necessarily true that the acquisitions will end up killing Open Source. The future may have no Open Source movement, but it will still have the hacker impulse. "It's not about Open Source, it's about the hacker impulse." So whether or not the future has people like Tatsuhiko making Plagger or Brad making LJ or Ben making MT, the future will still have hackers. Things wlil change, yes, but things will always be the same.
David Recordon says, "Don't lose the hacker mentality. Continue doing cool stuff, and continue innovating. Don't give up, if you really believe you need to represent something, take the chances and take the risk. Preserve the ideals. It's not all about cultures meshing, it's about ideals. If you don't represent your ideals in the corporate culture, you're doing a big disservice to the Open Source movement."
I don't know the next guy (Cliff Schmidt?), he is saying that the acquisition of Geronimo by IBM (not quite an acquisition?) has injected a lot of energy into the project. A lot of different energy. Says that Open Source is becoming more mainstream, more common. He looks forward to when Open Source is not something that people have "Open Source Coordinators" employees for, when it just ... is.
Denise says to "throw your body in front of the train, it really doesn't hurt that much to get run over". Hold together, keep the fight, keep it going. There will be pendulum swings, either people will accept it, or something new will come along. Doesn't want to lose what we have.
"Educating corporations is always second to us doing what we want to do. Do what you find interesting. Corporations will be interested if people are interested. Don't lose your power by trying to find the money."
I came in partway through this talk. It's about how to deal with peole who are considered poisonous - not really helping in a project, or helping in a way that is counter to the community.
They had a few points, traits you have to have. Humility was the one that jumped out to me, and thick-skin. There will be people who try to detract, you have to take it as it comes. (This isn't a very useful paragraph, in reality.)
Do not commit a huge entire feature all by itself in one fell swoop. Do development in a branch, so people can follow. Have peer review. Make sure people actually follow along. This reduces the bus-factor -- the concept that, if one person did all the work and gets hit by a bus, you're screwed.
Do not allow names in files. You are contributing to a project, and you want to avoid people owning particular parts of it. If there are no names, people are more likely to work on other parts of the codebase, without thinking about someone getting pissed or whatever. You can still have people who are the experts at that section, but it's not a formal involvement, it's just ways to help out.
Do not allow email filibustering. This is when someone will just email and email and email, even though they're just one person, they're acting as the vocal minority. The big minority. Don't rehash issues. Let sleeping decisions lie, don't go over them every few weeks.
Have a well defined process. For release, accepting patches, and reviewing them. You also need process for adding new committers. Community founders establish culture, and then the culture becomes self-selecting. These types of people tend to come here, etc.
(Vox, why are you making me hit enter twice now, halfway through my post... you really are making me sad, really.)
Voting is a last resort. If your community is healthy, you should never come down to this. If you're built on respect and humility, then you should be able to accomplish things to make decisions. It can be useful when a disagreement is going on for way too long, and you're spending too much time on thinking about it. If you do make votes, don't make weighted votes.
And now - when to flip the "bozo bit".
Communications annoyances - such as silly nicknames (like mine, whoops), or multiple nicknames (they change them up so as not to seem the same person), too many capitals, or weird punctuation, or "omg wtfs" etc. These are just generalizations, things to watch for.
Also, if the person tends to never get the clue. They don't pick up the mood, or understand the goals. They just show up and spit things out and demonstrate they've not been around and watching. They don't respect the community or the current situation. People who don't RTFM.
Hostility - if the person is flat out insulting, probably a bozo. If they're demanding help, with no sense of propriety. Entitled. Blackmail. Trolling. Riling people up. Accusations or paranoia. These are dead giveaways that someone is not helping the community in any way and detracting from your effort.
Conceit is another issue. They will say things like, "Oh if you don't do this, then this is going to happen!" or "If you do this, it will be the best thing ever!" Or, bringing back up old topics, see above about the email filibustering or whatnot. They won't read the archive, think they're the best thing since sliced bread, and are rather annoying. You want to keep an eye on them, see what they're saying, but don't lend much weight to it.
Lack of cooperation - talk talk talk, complain complain complain. If they're not willing to discuss how to improve their ideas and design, then that's a big problem. If they're so thin-skinned that they're unable to take criticism, then it will take too much effort to get things fixed in anything they do, it may not be worth it.
NOW! What to do with these people you've flagged?
Is this person draining attention and focus? If so, is this person really likely to benefit the project? Next, is this person paralyzing the project? Is it worth it?
Don't feed the person. Don't feed the troll. Don't give them any leverage. Don't get emotional! (Editor's note, this is my big one... whoops...)
Do pay attention to newcomers. Even if they're annoying at first, don't reject out of hand. You don't want to nuke people for sounding silly as opposed to being hostile. They could be just warming up, nervous and not sure. Do look for facts, under the emotions. Do know when to give up and ignore the person. Do know when to boot someone from the community. Do repel the trolls with niceness!
In summary...
Understand what's going on, keep your attention and focus, keep your community united against trouble, and deal with any problems. Don't jump to conclusions ... be sure before you act. Deal with the facts.
"Stick to what you're there to do... write great open source software!"
Lots, not too many were interesting. Audrey Tang had a funny one, but it wasn't informative in any way other than how badly you can abuse Perl (and Ruby).
Maybe more later, am in next talk.