A brief digression of annual "divorces per marriage" statistics for Sweden,

Comparison of divorce rates between same- and opposite-sex unions in Sweden, 2000-2008

Update, June 2011

Having seen this and reminded myself that I should revisit this and see what changes have happened, I went off to gather some more statistics. Since 2009-05-01 there is no longer any distinction between same- and opposite-sex unions in Sweden, so it is no longer possible to extend these graphs into 2010 as was my intent. However, as the reason for this is greater equality, my idle number-crunching woes pale in comparison.

I briefly toyed with the idea of making the comparison for 2009, but those numbers very much look like artefacts of laws changing half-way through the data set. It also seems as if dissolutions of registered partnerships (the same-sex unions) have not entirely transformed into marriages, at least for statistical purposes (there are registered partnership dissolutions in the statistics for 2010, but naturally no new registrations).

Original

In this article, Nate Silver writes that the change in divorce rates in US states that allow or are at least not super-negative to same-sex unions have decreased or at least not increased as much as when compared with highly conservative states.

Following that thesis, I have taken a look at divorce rates in Sweden since same-sex unions (in the form of civil partnerships, originally) were allowed back in 1995. I was initially hoping to find union (and divorce) statistics back to at least a few years before that, but SCB (the Swedish Statistical Bureau) do not have opposite-sex union and divorce stats going back further than 2000. So, unfortunately, I cannot quite do what I had intended, but I can at least look at how same- and opposite-sex union and divorce compare.

We'll need some comparative numbers and based on not very much, the closest we can get is (probably) "unions" divided by "divorces" on a per annum basis.

Bear in mind that there's something subtly odd about these numbers. When "marriages" and "registered partnerships" are tallied separately, at least I would naively expect that "number of men married" would be the same as "number of women married" and "registered partners" be an even number, both in the "male" and "female" category. However, this does not seem to be the case.

There are a couple of possible explanation models, one (the one I currently find most likely) is that they're counting only Swedish citizens, causing a discrepancy as people marry (or partner) someone from abroad.

With that in mind, here are some graphs. The value graphed (for each of the categories "women getting married to women", "women getting married to men", "men getting married to men" and "men getting married to women" (denoted, in order, as "female, homo", "female, het", "male, homo" and "male, het") is, for each category, the annual "people getting divorced" divided by "people getting married". This hopefully eliminates all oddities or not around the stats, as long as the "hitched" and "split" numbers are calculated identically.

I have left off the first five years of same-sex partnerships, in the vain hope that any initial "let's get hitched!" excitement leading to a quick divorce has, after that time, stabilised. I don't know if that is the case, as I don't have different-sex marriage data to compare to.

With this in mind, we'd expect that straight people's divorce rates should track pretty close (as, typically, one male and one female are involved in an opposite-sex divorce), but it might still be interesting to compare divorce rates for same-sex couples.

The small difference between male and female numbers may well be that more Swedish men than women marry people from abroad, thus having a slight over-representation in the "married" statistics, while people divorcing may well be more equally Swedish citizens. It is hard to say from just the numbers.

There's also a case to be made that since same-sex unions have been around for a shorter time, it may be that the divorces are yet to happen. This is not something I can contro lfor (but, maybe, worth re-running this data exercise in a few years and see what's happened in the interim).

It may also be that since same-sex unions are comparatively newer and, possibly, seen as "more difficult to enter", that the people ending up in a same-sex union are less prone to splitting. One way of noticing this would be to see how same-sex unions and divorces track over time. Note that for the years 1995-1997, the data files only have ".." as data for same-sex divorces, I don't know if that means "no data available", "zero" or "we're not telling". I have thus been forced to assume that it is 0, for the purposes of the graph below.

If you want the original data, the files I have used are available (opposite- and same-sex)), but bear in mind that these are plain-text files, in Swedish.

Time and space complexity

Complexity of algorithms

In my essay on data structures, I talk about complexity. Specifically, I talk about time complexity. In the study of algorithms, there are smoe differenmt types of complexity that people (in general) care about. The two most important are time complexity and space complexity.

Time compexity is(roughly speaking) how the algorithm behaves as the input increases. Something with time complexity O(n) will take twice the time when the input doubles, something with time complexity O(N^2) will take four times longer when the input duobles and so on.

Space complexity is the storage required for the execution of the algorithm. This includes all permanent and temporary storage required by the algoritm, including any necessary stakc allocations for recursions,

If we look at three different implementations of the Fibonacci algorithm, one straight-forward recursive implementation, one iterative and one recursive memoised version, we can compare and contrast different trade-ofsf between time and space complexity.

First, the straight-forward recursive algorithm:

(defun fib (n)
  (cond ((< n 2) n)
        (t (+ (fib (- n 1)) (fib (- n 2))))))
The space complexity is O(n), mostly taken up by stack allocations. The time complexity is O(fib(n+1)), this is approximately O(e^n). Not the best performer in the world.

Next, we look at an iterative version.

(defun fib (n)
  (let ((a 0)
        (b 1))
    (dotimes (i n) (psetf a (+ a b) b a))
    a))
The time complexity has gone down to O(n) and the space complexity is down to O(1). Probably a good choice for most cases and still pretty straight-forward to analyse.

Last, we have the memoised Fibonacci:

(let ((memo (make-hash-table)))
  (setf (gethash 0 memo) 0)
  (setf (gethash 1 memo) 0)
  (defun fib (n)
    (let ((lookup (gethash n memo)))
      (cond (lookup lookup)
            (t (fib (- n 1) (fib (- n 2))))))))
This one is interesting. Space complexity is back up to O(n) (essentially, each computed value will take up constant space in the lookup table). However, run-time complexity is a bit harder to compute, but by looking carefully at what the algorithm does, we can see that the first time fib(n) is computed, it will have an O(n) ruintime, but then have an O(1) run-time. If we are computing a lot of Fiboniacci values, we will see an amortised run-time complexity of O(1).

Data structures and their approximate time complexity for some common

Data structures

In a recent conversation, I realised that some of the more basic data structures have some non-intuitive behaviour. This is, of course, an excellent excuse to waffle on about the guarantees that a well-implemented data structure gives you, as well what it DOESN'T give you. The time complexity is estimated for a data structure with n elements.

I am noting down time complexity in so-called "Big O" notation. This, approximately, describe how the time to do a given task grows with the size of the input. Roughly speaking, O(1) is "constant time", O(n) is "linear time" (where doubling the size of the input means it takes double the time), O(n^2) is "quadratic time" (doubling the size of input will require quadruple time) and O(2^n) is "exponential time" (where adding one item to process will require double the time).

I am also choosing to only consider "time complexity", rather than "space complexity" (another possible complexity measure), mostly because time complexity is usually the more interesting complexity (especially with the size of storage these days).

Linked list

OperationTime complexity
Adding an element at head O(1)
Adding an internal element O(n)
Changing an element at head O(1)
Changing an internal element O(1)
Removing an element at head O(1)
Removing an internal element O(n)
Retrieving the Nth element O(n)

This is your average single-linked list, a humble, simple data structure composed of nodes containing one item of data and one pointer to the next node in the list. It can be used to build more complex data structures. Note that a linked list can trivially be used as a stack by adding and removing elements at the list head, where doing so is cheap.

The cost for changing an internal element is based on already having a pointer to it, if you need to find the element first, the cost for retrieving the element is also taken.

To remove an internal element, you need to scan the list to find the preceeding element, to change its tail pointer. Similarly for adding an internal element (in some situations, you can amortise this cost by keeping track of the preceeding element as you follow the tail pointers, then you can get away with inserting the new element at O(1)).

The single-linked list is order-preserving (that is, you can rely on the list being in the same order when you scan through it, unless you've explicitly re-ordered it)

Double-linked list

OperationTime complexity
Adding an element at head O(1)
Adding an internal element O(1)
Changing an element at head O(1)
Changing an internal element O(1)
Removing an element at head O(1)
Removing an internal element O(1)
Retrieving the Nth element O(n)

The double-linked list is a bit more complex than the sigle-linked list. It keeps a value and pointers to preceding and following nodes. This means that some operations is faster than with a single-linked list, but there's a constant overhead, both in storage and in time, whenever a list modification happens.

The double-linked list is order-preserving (that is, you can trust it to be in the same order unless you've explicitly re-ordered it).

Variants on single- and double-linked lists

Both kinds of lists can have an associated "list header" node that contains pointers to the first and last elements of the list it concerns. With this in place, adding or changing an element in the "tail" position is O(1) and for a double-linked list it is also O(1) to remove an element in a tail position.

Vector/one-dimensional array

OperationTime complexity
Adding an element at head O(n)
Adding an internal element O(n)
Changing an element at head O(1)
Changing an internal element O(1)
Removing an element at head O(n)
Removing an internal element O(n)
Retrieving the Nth element O(1)

The array is a contigous area of memory, with the mth element placed m*size octets from the start of the array. This means that any addition or removal of elements require memory copies, making size-changing operations expensive. In a scenario where elements are occasionally added to the array, in the "tail" position, the cost of doing so can be amortised by doubling the size of the array allocation and manually keeping track of where the end is supposed to be, turning the O(n) cost of adding an element into an amortised O(1).

The vector is order-preserving (that is, you can trust it to stay in the same order unless you explicitly re-order it).

Hash table

OperationTime complexity
Adding an element at head O(1)
Adding an internal element O(1)
Changing an element at head O(1)
Changing an internal element O(1)
Removing an element at head O(1)
Removing an internal element O(1)
Retrieving the Nth element O(1)

The hash table is not (really) O(1), it's just amortised O(1), any specific operation can take longer, but on average, adding, removing and changing elements is O(1).

The hash table is not order-preserving, there is no guarantee that two traversals of all elements will return them in the same order. However, it is (usually) the case that two traversals with no added, removed or changed elements will have the same order (this MAY be different in garbage-collecting languages, if memory location is used as the thing to be hashed, as each access after GC would require a re-hash).

I have taken the liberty to interpret "head position" as "key that would be sorted lower than any other key", but since hash tables are not order-preserving, it doesn't make much sense.

"Simple" string

OperationTime complexity
Adding an element at head O(n)
Adding an internal element O(n)
Changing an element at head O(1)
Changing an internal element O(1)
Removing an element at head O(n)
Removing an internal element O(n)
Retrieving the Nth element O(1)
With a "simple" string, I am referring to a string where each character is stored in the same amount of storage. This (usually) means that you're talking about single-byte strings (though some languages use UCS4 to store in-memory strings, converting to and from external storage formats on I/O). A "simple" string is essentially a vector of characters, although some environments will attach more information to strings than they do to vectors. In essence, however, the performance guarantees of a "simple" string is those of a vector.

"Complex" string

OperationTime complexity
Adding an element at head O(n)
Adding an internal element O(n)
Changing an element at head O(n)
Changing an internal element O(n)
Removing an element at head O(n)
Removing an internal element O(n)
Retrieving the Nth element O(n)
With a "complex" string, I am referring to a string storage scheme that uses something like UTF-8 to store strings in memory. While more compact than storing strings in UCS4, it does mean that it is no longer trivial to index into the string. Some of the convenience of the "simple" string scheme can be brought back with careful use of helper functions (usually in the form of some sort of iteration library).

Heap

OperationTime complexity
Adding an element at head O(log n)
Adding an internal element O(log n)
Changing an element at head n/a
Changing an internal element n/a
Removing an element at head O(log n)
Removing an internal element n/a
Retrieving the Nth element n/a
A heap is a recursive, treelike data struvture with one guarantee. The root of a heap is "smaller" than the leaf heaps. There is no guarantee that there's any ordering between the sub-heaps. With some implementation strategies, balancing the size of the sub-heaps is (essentially) instrinsic, but there's no guarantee of that either.

The only way to retrieve the Nth element would be to remove the N first elements, remember the value of the Nth and then put them all back together.

It does not make (much) sense changing heap-stored elements, so I have marked that as "not applicable".

Balanced binary tree

OperationTime complexity
Adding an element at head O(lg n)
Adding an internal element O(lg n)
Changing an element at head O(lg n)
Changing an internal element O(lg n)
Removing an element at head O(lg n)
Removing an internal element O(lg n)
Retrieving the Nth element O(lg n)
"Change" doesn't, quite, make sense for these, if you expect "change" to mean "change the key value". Changing other than what we're expecting to use as a key is, however, possible.

In many respects, a heap is a specialised binary tree.

Data structure

OperationTime complexity
Adding an element at head O()
Adding an internal element O()
Changing an element at head O()
Changing an internal element O()
Removing an element at head O()
Removing an internal element O()
Retrieving the Nth element O()

Thanks

I'd like to thank Örjan Westin, Jim Prewett and Stephen Harris, for valuable criticism during the writing of his essay. Similarly, the StackOverflow community for bringing it back to memory. It could probably stand an overhaul with more exciting data structures (skip lists and what-not), but that is "not for now".

KOM, USENET and...

A brief history of electronic conferencing

Overview

Electronic conferencing has been with us since at least the mid-70s. The first systems were not networked, with (as far as I can tell) USENET (or NetNews) being the first system/protocol designed explicitly for distributed use, sometime 1978/1979. FidoNet, as popular as it was in its heyday, is a relative late-comer, as it stems from 1984.

Beginning in the mid-90s, assorted web fora started becoming popular, there's basically too many to list, but one characteristic they all share is a lack of (obvious) threading. Some even do not know what you have or haven't read, making it harder than necessary to come back later.

KOM and its relatives

The system I "grew up with" (not, ny any means, the first I used, but the one that definitely got me hooked) was KOM. Specifically, UFH-KOM. It had at least a spiritual ancestor in QZ-KOM, the KOM system developed for QZ, the academic computing centre at Stockholm University.

A typical KOM system is split into several subject areas, called "conference(s)" (sw: 'möte(n)'), these correspond roughly to newsgroups in USENET or areas in at least some BBS packages. It also features subject threading (or "comment chains" as it's mostly known in KOM).

KOM typically use a command-line prompt with "full" commands. Some KOM systems will accept various short-forms of commands and may even have defaults (one common short-cut is to have "read next" as the default on pressing return at the prompt).

KOM is, itself, the spiritual descendant of PLANET/FORUM, from the mid-70s (I have not been able to find a definite "was created" date for it, but it seems to have been around in August 1975).

BBSes

At least in popular history, the first BBS was created around Chicago early 1978, by Ward Christensen. The phenomenon spread rapidly and (as you might have guessed) eventually cross beams with KOM, as the Swedish BBS software TCL (The Common Link) was heavily inspired by KOM.

USENET

In the 1979 timeframe, USENET got started in North Carolina, using a bunch of shell scripts to exchange articles between three machines. These were then re-written in C and soon replaced by Anews. Anews was then replaced by Bnews in 1982.

USENET has the ability to keep threads of comments by the use of the References header, where each posted article can refer to its predecessor(s) in the thread.

As is obvious from its history, USENET is a distributed system, flooding articles from one node to another until such a time that all "next nodes" claim to not want any specific article.

FidoNet

FidoNet is a communication protocol between BBSes and was very popular in Sweden (off-hand, most BBSes I can remember from Sweden were on FidoNet). I can't, as always, speak for its popularity elsewhere, but I get the impression that it wasn't entirely unpopular and the phone billing practices in the US would have helped in making it popular.

Moral outrages

Moral outrage, generalities and a specific instance

It is easy to go "I don't like this", "I find this harmful" or "I am outraged at behaviour X" and ask oter people to ban it. It's seductively easy, really.

But, if we take a step back, is it a good thing to do? I claim it isn't and I will try to explain why. Be aware that I will use epilepsy and a petition created by an epileptic as an explicit exampe further on. I will not pull punches, as and where I think they're deserved to not be pulled, but I will also try not to be offensive for the sake of being offensive.

So, what is this "moral outrage" on which I pontificate? It is an indignation over a wrong, perceived or real, that prompts an individual X to ask other individuals to regulate reality so that it suits X. It takes many forms, like forbidding support to programs trying to help the developed world and include information on family planning (like "use of condoms"), you can find it in law regulating Internet use, you can find it in semi-law (like the Internet Watch Foundation in the UK) and, in our specific example, a petition to YouTube to ban videos mocking epilepsy. What I am trying to say is that you have no moral right not to be offended. At all. You should, however, be provided with enough information to know where you are and are not likely to be offended. You also have the right to expect that a "no offense to sensibility Y" room (I use "room" in a very general sense here, as describing something that may be a physical or an informational space)

Moving from generalities to specifics, I've read the petition. I feel for Ms XXX YYY. I understand that she now sees YouTube as a hostile environment. That is, truly, sad. If she has been subjected to auto-starting YouTube videos embedded in a web page, I agree that the video should at the very least not have been auto-starting.

However, as the petition is worded, she has not asked for that. She has asked YouTube to forbid mockery of epilepsy and as much as I want to, I will not, nay cannot agree with this petition and think it should be politely ignored by Google, as it tries to curtail the very freedom of speech that the petitioner speaks so highly of. Had this been for a site (or a section of a site) specifically targetted towards providing those who suffers from epilepsy a forum, information, assistance or just mutual support, I think the petition would be spot-on, but to the best of my knowledeg, YouTube is none of these things.

Looking at what options are available for tagging and flagging videos on YouTube, I do not see any "may trigger seizure" (or, possibly better, "contains flashing images"). This would, in my opinion, be a better thing to petition for.

Note, I do not have epilepsy, but I know people who do. I do not find it funny to force videos with extensive flashing that is likely to trigger seizures. I, personally, have a problem with strobe lights, but it is not seizure-related. I find rapidly flashing images to be stressful and strobe lights in an otherwise dark room with people moving about is more than likely to induce stress-related anciety and aggression in me. So while I may not speak from the exact point of view that the petitioner wants me to see things from, I do have at least a smidgen of an idea where she is coming from. I still think it would be insane for me to even write a petition banning strobe lights to be used in dance halls (or whatever the young ones call it today), it is much easier for me to avoid these.