While we're talking about stats and Python could I convince someone here to implement a fast medcouple for statsmodels? I can't do it myself because I read R's GPL'ed code in order to understand the algorithm. Using my understanding, I wrote the following high-level description of it:
This should be taken as the design spec of a clean-room reverse engineering, so that we can have a free, fast and non-copylefted implementation. It's not that I have a problem with copyleft (in fact, I prefer it), but I really want statsmodels to fix their implementation, and they're GPL-phobic.
Since Seaborn has boxplots, implementing an adjusted boxplot seems relevant.
edit: Oh, one more thing. I'd love any feedback on how to improve the "design spec", in case I wasn't able to make it clear enough.
Are you really suggesting that it's not possible for you write an independent implementation of an algorithm from the version you once read? Is this sort of "clean room" approach typical? It strikes me as absurdly cautious.
It is what is known to be legally safe. We do it all the time in GNU Octave, and we always tell people to not read Matlab source code when implementing Octave functions.
Besides, I just don't feel like it's fair to the R copyright authors. They worked hard to produce an implementation and they copylefted it, and I heavily relied on their implementation in order to reimplement it myself.
According to the United States Copyright Office[1], the algorithm itself can't be copyrighted, so that's why I wrote a high-level description of the algorithm.
----
[1] "Copyright protection is not available for ideas, program logic, algorithms, systems, methods, concepts, or layouts."
I agree -- you shouldn't read sources you don't want to treat as derivative while you're writing an independent implementation. So yes, the code you've already written is GPL.
But suppose you decide to reimplement the algorithm now, months later (based only on the notes you wrote on Wikipedia). I'm not a lawyer, but I would say that's almost certainly independent, unless you have extraordinary memory.
I don't know. Maybe. I don't know how a judge and a jury would interpret that situation. They might agree with you or they might not. The only safe jurisprudence I know of is clean-room design, and it's what the SFLC documents I have received recommend.
Interesting dilemma. I had assumed that re-writing in another language basically meant you were copyright free (as long as you're not literally transcribing or automatically translating), but it appears it's not that simple: http://imranontech.com/2006/12/04/are-algorithms-copyrightab...
I'm new to Seaborn and matplotlib in general, but Seaborn is a wrapper on top of matplotlib, and from what I can tell, was borne partly out of frustration with how hard it is to get matplotlib graphics to look decent out-of-the-box. Which makes it, in one sense, kind of like what ggplot2 was to R's standard plotting tools.
However, Seaborn has a more object-oriented API, among other things:
> Seaborn’s goals are similar to those of R’s ggplot, but it takes a different approach with an imperative and object-oriented style that tries to make it straightforward to construct sophisticated plots. If matplotlib “tries to make easy things easy and hard things possible”, seaborn aims to make a well-defined set of hard things easy too.
There already is an attempt to port ggplot over to Python, and its authors' opinion is that its API should look like R's ggplot2, which means the syntax is not Pythonic: http://ggplot.yhathq.com/
which gets you ~75% of the way there to ggplot style plots. I've found a number of edge cases where the plots don't turn out right when using the ggplot sheet. BUT the important part is you can set your own default plotting style with a single line of code and keep everything nice and pythonic (well kind of pythonic since you're using matplotlib...)
Seaborn is my favorite statistical plotting package in Python. I wrote an astro plotting package that digs deep into the Matplotlib internals and it was not easy. Big props to the developer behind Seaborn and the great aesthetics he imbued it with.
I love using Seaborn for my plots, I find myself referring to the color palettes for other data visualizations too! I wish there were more of a variety, but thanks to those who have contributed to the project.
https://en.wikipedia.org/wiki/Medcouple
This should be taken as the design spec of a clean-room reverse engineering, so that we can have a free, fast and non-copylefted implementation. It's not that I have a problem with copyleft (in fact, I prefer it), but I really want statsmodels to fix their implementation, and they're GPL-phobic.
Since Seaborn has boxplots, implementing an adjusted boxplot seems relevant.
edit: Oh, one more thing. I'd love any feedback on how to improve the "design spec", in case I wasn't able to make it clear enough.