You are hereFeed aggregator

Feed aggregator


Adaptive thresholding for binarization

Matlab Image processing blog - 2016, July 25 - 08:01

Despite recent appearances on the blog, I still exist! It's just been a little crazier than usual for the last month or so.

Anyway ... I'm back, and I'm going to try to wrap things up about image binarization. In my 14-Jun-2016 post, I discussed the algorithm underlying imbinarize for the global thresholding case. Today I'm going to talk about the algorithm for the adaptive thresholding case.

Here's an image suffering from an extreme case of nonuniform illumination.

I = imread('printedtext.png'); imshow(I)

Here is the binarization using a global threshold.

bw1 = imbinarize(I); imshow(bw1) title('Global threshold')

And here is the binarization using an adaptive threshold. Note that we have to tell the function that the foreground pixels (representing text characters) are darker than the background pixels (the white paper).

bw2 = imbinarize(I,'adaptive','ForegroundPolarity','dark'); imshow(bw2) title('Adaptive threshold')

The algorithm used by |imbinarize(I,'adaptive',...) is sometimes called Bradley's method, for the paper by D. Bradley and G. Roth, "Adaptive Thresholding Using Integral Image," Journal of Graphics Tools, vol. 12, issue 2, pp. 13-21, 2007.

This method uses a large-neighborhood mean filter. If the input image pixel is more than a certain percentage greater than the mean filter, then it is set to white.

To perform large-neighborhood mean filtering (also called box filtering) efficiently, the implementation uses something called an integral image. With this technique, time required to perform mean filtering depends only on the number of image pixels. The time is independent of the neighborhood size. Maybe I'll discuss integral images and box filtering in a future post. In the meantime, you can look at integralImage and imboxfilt.

So how big is the mean filter neighborhood? Well, there's no fixed rule. This is another one of those magic numbers that bedevil image processing. The function imbinarize uses a square neighborhood that is about 1/8 of the smallest image dimension. This is just a heuristic rule that works reasonably well for a variety of images.

The function imbinarize does everything for you in one step. It computes the adaptive threshold image and then applies it to produce a binary output image. If you want the adaptive threshold image itself, or if you want more control over the how the adaptive threshold image is computed, then you can use adaptthresh.

Here is the adaptive threshold image for the printed text example shown above.

T = adaptthresh(I,'ForegroundPolarity','dark'); imshow(T) title('Adaptive threshold image')

When you use adaptthresh, you can control the neighborhood size directly. You can also specify other local background measurement methods, including median filtering and Gaussian filter.

Wrapping Up

With the new set of Image Processing Toolbox interfaces, using imbinarize as your one-step solution for both global and adaptive thresholding. Gain finer control over algorithm details, if you need to, by using the underlying functions otsuthresh and adaptthresh. The older functions, im2bw and graythresh, still exist for compatibility, but we encourage you to use the new functions in your new code.

\n'); d.write(code_string); // Add copyright line at the bottom if specified. if (copyright.length > 0) { d.writeln(''); d.writeln('%%'); if (copyright.length > 0) { d.writeln('% _' + copyright + '_'); } } d.write('\n'); d.title = title + ' (MATLAB code)'; d.close(); } -->


Get the MATLAB code (requires JavaScript)

Published with MATLAB® R2016a

, I discussed the algorithm underlying % % for the global thresholding case. Today I'm going to talk about the % algorithm for the adaptive thresholding case. % % Here's an image suffering from an extreme case of nonuniform % illumination. I = imread('printedtext.png'); imshow(I) %% % Here is the binarization using a global threshold. bw1 = imbinarize(I); imshow(bw1) title('Global threshold') %% % And here is the binarization using an adaptive threshold. Note that we % have to tell the function that the foreground pixels (representing text % characters) are darker than the background pixels (the white paper). bw2 = imbinarize(I,'adaptive','ForegroundPolarity','dark'); imshow(bw2) title('Adaptive threshold') %% % The algorithm used by |imbinarize(I,'adaptive',...) is sometimes called % Bradley's method, for the paper by D. Bradley and G. Roth, "Adaptive % Thresholding Using Integral Image," Journal of Graphics Tools, vol. 12, % issue 2, pp. 13-21, 2007. % % This method uses a large-neighborhood mean filter. If the input image % pixel is more than a certain percentage greater than the mean filter, % then it is set to white. % % To perform large-neighborhood mean filtering (also called _box filtering_) % efficiently, the implementation uses something called an _integral % image_. With this technique, time required to perform mean filtering % depends only on the number of image pixels. The time is independent of % the neighborhood size. Maybe I'll discuss integral images and box % filtering in a future post. In the meantime, you can look at % and % . % % So how big is the mean filter neighborhood? Well, there's no fixed rule. % This is another one of those magic numbers that bedevil image processing. % The function |imbinarize| uses a square neighborhood that is about 1/8 of % the smallest image dimension. This is just a heuristic rule that works % reasonably well for a variety of images. % % The function |imbinarize| does everything for you in one step. It % computes the adaptive threshold image and then applies it to produce a % binary output image. If you want the adaptive threshold image itself, or % if you want more control over the how the adaptive threshold image is % computed, then you can use % . % % Here is the adaptive threshold image for the printed text example shown % above. T = adaptthresh(I,'ForegroundPolarity','dark'); imshow(T) title('Adaptive threshold image') %% % When you use |adaptthresh|, you can control the neighborhood % size directly. You can also specify other local background measurement % methods, including median filtering and Gaussian filter. % % *Wrapping Up* % % With the new set of Image Processing Toolbox interfaces, using % |imbinarize| as your one-step solution for both global and adaptive % thresholding. Gain finer control over algorithm details, if you need to, % by using the underlying functions |otsuthresh| and |adaptthresh|. The % older functions, |im2bw| and |graythresh|, still exist for compatibility, % but we encourage you to use the new functions in your new code. ##### SOURCE END ##### fe9ea6458a2b48c3892cb04123763a2a -->

Categories: Blogs

EFF is suing the US government to invalidate the DMCA’s DRM provisions

Cory Doctorow - 2016, July 21 - 07:24

The Electronic Frontier Foundation has just filed a lawsuit that challenges the Constitutionality of Section 1201 of the DMCA, the “Digital Rights Management” provision of the law, a notoriously overbroad law that bans activities that bypass or weaken copyright access-control systems, including reconfiguring software-enabled devices (making sure your IoT light-socket will accept third-party lightbulbs; tapping into diagnostic info in your car or tractor to allow an independent party to repair it) and reporting security vulnerabilities in these devices.


EFF is representing two clients in its lawsuit: Andrew “bunnie” Huang, a legendary hardware hacker whose NeTV product lets users put overlays on DRM-restricted digital video signals; and Matthew Green, a heavyweight security researcher at Johns Hopkins who has an NSF grant to investigate medical record systems and whose research plans encompass the security of industrial firewalls and finance-industry “black boxes” used to manage the cryptographic security of billions of financial transactions every day.

Both clients reflect the deep constitutional flaws in the DMCA, and both have standing to sue the US government to challenge DMCA 1201 because of its serious criminal provisions (5 years in prison and a $500K fine for a first offense).

The US Trade Rep has propagated the DMCA’s anticircumvention rules to most of the world’s industrial nations, and a repeal in the US will strengthen the argument for repealing their international cousins.

Huang has written an inspirational essay explaining his reasons for participating in this suit, explaining that he feels it is his duty to future generations:

Our recent generation of Makers, hackers, and entrepreneurs have developed under the shadow of Section 1201. Like the parable of the frog in the well, their creativity has been confined to a small patch, not realizing how big and blue the sky could be if they could step outside that well. Nascent 1201-free ecosystems outside the US are leading indicators of how far behind the next generation of Americans will be if we keep with the status quo.

Our children deserve better.

I can no longer stand by as a passive witness to this situation. I was born into a 1201-free world, and our future generations deserve that same freedom of thought and expression. I am but one instrument in a large orchestra performing the symphony for freedom, but I hope my small part can remind us that once upon a time, there was a world free of such artificial barriers, and that creativity and expression go hand in hand with the ability to share without fear.

The EFF’s complaint, filed minutes ago with the US District Court, is as clear and comprehensible an example of legal writing as you could ask for. It builds on two recent Supreme Court precedents (Golan and Eldred), in which the Supremes stated that the only way to reconcile free speech with copyright’s ability to restrict who may utter certain words and expressions is fair use and other exemptions to copyright, which means that laws that don’t take fair use into account fail to pass constitutional muster.

In this decade, more and more companies have figured out that the DMCA gives them the right to control follow-on innovation and suppress embarrassing revelations about defects in their products; consequently, DMCA 1201-covered technologies have proliferated into cars and tractors, medical implants and home security systems, thermostats and baby-monitors.

With this lawsuit, the EFF has fired a starter pistol in the race to repeal section 1201 of the DMCA and its cousins all over the world: to legitimize the creation of commercial businesses that unlock the value in the gadgets you’ve bought that the original manufacturers want to hoard for themselves; to open up auditing and disclosure on devices that are disappearing into our bodies, and inside of which we place those bodies.

I’ve written up the lawsuit for the Guardian:


Suing on behalf of Huang and Green, EFF’s complaint argues that the wording of the statute requires the Library of Congress to grant exemptions for all conduct that is legal under copyright, including actions that rely on fair use, when that conduct is hindered by the ban on circumvention.

Critically, the supreme court has given guidance on this question in two rulings, Eldred and Golan, explaining how copyright law itself is constitutional even though it places limits on free speech; copyright is, after all, a law that specifies who may utter certain combinations of words and other expressive material.

The supreme court held that through copyright’s limits, such as fair use, it accommodates the first amendment. The fair-use safety valve is joined by the “idea/expression dichotomy”, a legal principle that says that copyright only applies to expressions of ideas, not the ideas itself.

In the 2015 DMCA 1201 ruling, the Library of Congress withheld or limited permission for many uses that the DMCA blocks, but which copyright itself allows – activities that the supreme court has identified as the basis for copyright’s very constitutionality.

If these uses had been approved, people such as Huang and Green would not face criminal jeopardy. Because they weren’t approved, Huang and Green could face legal trouble for doing these legitimate things.


MATTHEW GREEN, ANDREW HUANG and ALPHAMAX, LLC v U.S. DEPARTMENT OF JUSTICE,
LORETTA LYNCH: COMPLAINT FOR DECLARATORY
AND INJUNCTIVE RELIEF
[EFF]

America’s broken digital copyright law is about to be challenged in court
[Cory Doctorow/The Guardian]

Why I’m Suing the US Government
[Andrew “bunnie” Huang]

Section 1201 of the DMCA Cannot Pass Constitutional Scrutiny

[Kit Walsh/EFF]

(Image: Bunnie Huang, Joi Ito, CC-BY)

Categories: Blogs

Comicon Schedule!

Flog - 2016, July 18 - 19:30

Hey all, here’s my COMICON SCHEDULE!

Thursday:
12-1pm Geek and Sundry Panel at Indigo Ballroom (Badge needed)
2pm: Nerd HQ Panel

Friday
10:30-11:30am: ConMan Panel Hall H (Badge Needed)
2:45pm: Q/A at Petco Park Stage

Saturday:
4-4:45pm MST3K Panel at Petco Park Stage
5:15-6:15pm MST3K SHOUT! Book Signing (Badge needed)
8:30-9:30pm MST3K Panel Room 24ABC (Badge needed)

Come by and say hi if you have time!

Categories: Blogs

My interview on Utah Public Radio’s “Access Utah”

Cory Doctorow - 2016, July 12 - 10:24

Science fiction novelist, blogger and technology activist Cory Doctorow joins us for Tuesday’s AU. In a recent column, Doctorow says that “all the data collected in giant databases today will breach someday, and when it does, it will ruin peoples’ lives. They will have their houses stolen from under them by identity thieves who forge their deeds (this is already happening); they will end up with criminal records because identity thieves will use their personal information to commit crimes (this is already happening); … they will have their devices compromised using passwords and personal data that leaked from old accounts, and the hackers will spy on them through their baby monitors, cars, set-top boxes, and medical implants (this is already hap­pening)…” We’ll talk with Cory Doctorow about technology, privacy, and intellectual property.

Cory Doctorow is the co-editor of popular weblog Boing Boing and a contributor to The Guardian, Publishers Weekly, Wired, and many other newspapers, magazines and websites. He is a special consultant to the Electronic Frontier Foundation, a non-profit civil liberties group that defends freedom in technology law, policy, standards and treaties. Doctorow is also an award-winning author of numerous novels, including “Little Brother,” “Homeland,” and “In Real Life.”

MP3

Categories: Blogs

As browsers decline in relevance, they’re becoming DRM timebombs

Cory Doctorow - 2016, July 8 - 10:08


My op-ed in today’s issue of The Tech, MIT’s leading newspaper, describes how browser vendors and the W3C, a standards body that’s housed at MIT, are collaborating to make DRM part of the core standards for future browsers, and how their unwillingness to take even the most minimal steps to protect academics and innovators from the DMCA will put the MIT community in the crosshairs of corporate lawyers and government prosecutors.

If you’re a researcher or security/privacy expert and want to send a message to the W3C that it has a duty to protect the open web from DRM laws, you can sign this open letter to the organization.

The W3C’s strategy for “saving the web” from the corporate-controlled silos of apps is to replicate the systems of control that make apps off-limits to innovation and disruption. It’s a poor trade-off, one that sets a time-bomb ticking in the web’s foundations, making the lives of monopolists easier, and the lives of security researchers and entrepreneurs much, much more perilous.

The Electronic Frontier Foundation, a W3C member, has proposed a compromise that will protect the rights of academics, entrepreneurs, and security researchers to make new browser technologies and report the defects in the old ones: we asked the W3C to extend its patent policy to the DMCA, so that members who participated in making DRM would have to promise not to use the DMCA to attack implementers or security researchers.

But although this was supported by a diverse group of W3C members, the W3C executive did not adopt the proposal. Now, EME has gone to Candidate Recommendation stage, dangerously close to completion. The purpose of HTML5 is to provide the rich interactivity that made apps popular, and to replace apps as the nexus of control for embedded systems, including the actuating, sensing world of “internet of things” devices.

We can’t afford to have these devices controlled by a system that is a no-go zone for academic work, security research, and innovative disruption. Although some of the biggest tech corporations in the world today support EME, very few of them could have come into being if EME-style rules had been in place at their inception. A growing coalition of leading international privacy and security researchers have asked the W3C to reconsider and protect the open web from DRM, a proposal supported by many W3C staffers, including Danny Weitzner (CSAIL/W3C), who wrote the W3C’s patent policy.

Browsers’ bid for relevance is turning them into time-bombs
[Cory Doctorow/The Tech]

(Image: Wfm stata center, Raul654, CC-BY-SA)

Categories: Blogs

Peak indifference: privacy as a public health issue

Cory Doctorow - 2016, July 3 - 18:57

My latest Locus column, “Peak Indifference”, draws a comparison between the history of the “debate” about the harms of smoking (a debate manufactured by disinformation merchants with a stake in the controversy) and the current debate about the harms of surveillance and data-collection, whose proponents say “privacy is dead,” while meaning, “I would be richer if your privacy were dead.”


Smoking’s harms were hard to pin down in part because the gap between cause (a drag on a cigarette) and effect (cancer) was not immediate nor was it absolute. Most drags on cigarettes don’t cause cancer, just like most privacy disclosures don’t harm you. But with enough drags — or enough private information sucked up via surveillance capitalism, disaster is inevitable.


Long before smoking became unacceptable, there was a moment of “peak indifference,” the moment when the number of people who weren’t worried about smoking started to decline, and never recover. The privacy wars are reaching that moment now, with millions of people having their lives ruined by data breaches, and that means there’s a new tactical challenge for privacy advocates.

Rather than convincing people to care about privacy, now we have to convince them to do something about it.

The anti-smoking movement made great strides with this. They made sure that people who had cancer – or whose loved ones did – understood that tobacco’s use wasn’t a blameless, emergent phenomenon. They named names and published documents, showing exactly who conspired to destroy lives with cancer in order to enrich themselves. They surfaced and highlighted the risks to non-smokers’ lives from smoking: not just second-hand smoke, but also the public health burdens and the terrible losses felt by survivors after their loved ones had perished. They de­manded architectural changes – bans on smoking – and legal ones, and market ones, and normative ones. Peak indifference let those activists move from convincing to fighting back.

That’s why it’s time for privacy activists to start thinking of new tac­tics. We are past peak indifference to online surveillance: that means that there will never be a moment after today in which fewer people are alarmed by the costs of sur­veillance. The bad news is that 20 years of failing to convince people of the risks of online privacy has built up a reservoir of inevitable harms: all the data collected in giant databases today will breach someday, and when it does, it will ruin peoples’ lives. They will have their houses stolen from under them by identity thieves who forge their deeds (this is already happening); they will end up with criminal records because identity thieves will use their personal information to commit crimes (this is already happening); they will be accused of terrorism or other life-destroying categories of crimes because an algorithm has mined their data to come to a conclusion they aren’t allowed to see or interrogate (this is already happening); they will have their devices compromised using passwords and personal data that leaked from old accounts, and the hackers will spy on them through their baby monitors, cars, set-top boxes, and medical implants (this is already hap­pening); they will have the sensitive information they disclosed to the government to attain security clearance breached and warehoused by blackmailing enemy states (this is already happening); their employers will fail when their personal information is used to commit industrial espionage (this is already happening).

Peak Indifference [Locus Magazine]

Categories: Blogs

I’m profiled in the Globe and Mail Report on Business magazine

Cory Doctorow - 2016, June 27 - 10:35

The monthly Report on Business magazine in the Canadian national paper The Globe and Mail profiled my work on DRM reform, as well as my science fiction writing and my work on Boing Boing.

I’m grateful to Alec Scott for the coverage, and especially glad that the question of the World Wide Web Consortium’s terrible decision to standardize DRM as part of HTML5 is getting wider attention.

If you want learn more, here’s a FAQ, and here’s a letter you can sign onto in which we’re asking the W3C to take steps to protect security disclosures and competition on the web.

He doesn’t always have the last word with Berners-Lee, though. “I was surprised and disappointed that he recently announced that W3C was going to start standardizing DRM.…There is a sense among a lot of people that the Web is cooked.”

W3C is the World Wide Web Consortium, which Berners-Lee runs, and Doctorow is upset because it’s setting up a standardized regime for digital rights management, or DRM—the locks that tech and entertainment companies put on their products—to prevent people from sharing their wares.

Doctorow criticizes American and Canadian legislation that makes it an offence to tamper with these locks. After all, analog publishers can’t control what use purchasers make of their books. And the locks seldom help the creatives who originally produced the content. (1) In joking homage to Isaac Asimov’s laws of robotics, Doctorow has his own law: “Any time someone puts a lock on something that belongs to you and won’t give you the key, that lock isn’t there for your benefit.”


The crusader fighting lock-happy entertainment conglomerates
[Alec Scott/The Globe and Mail]

Categories: Blogs

How to protect the future web from its founders’ own frailty

Cory Doctorow - 2016, June 24 - 11:15

Earlier this month, I gave the afternoon keynote at the Internet Archive’s Decentralized Web Summit, and my talk was about how the people who founded the web with the idea of having an open, decentralized system ended up building a system that is increasingly monopolized by a few companies — and how we can prevent the same things from happening next time.

The speech was very well received — it got a standing ovation — and has attracted a lot of discussion since.

Jonke Suhr has done me the service of transcribing the talk, which will facilitate translating it into other languages as well as making it accessible to people who struggle with video. Many thanks, Jonke!

This is also available as an MP3 and a downloadable video.

I’ve included an edited version below:

So, as you might imagine, I’m here to talk to you about dieting advice. If you ever want to go on a diet, the first thing you should really do is throw away all your Oreos.

It’s not that you don’t want to lose weight when you raid your Oreo stash in the middle of the night. It’s just that the net present value of tomorrow’s weight loss is hyperbolically discounted in favor of the carbohydrate rush of tonight’s Oreos. If you’re serious about not eating a bag of Oreos your best bet is to not have a bag of Oreos to eat. Not because you’re weak willed. Because you’re a grown up. And once you become a grown up, you start to understand that there will be tired and desperate moments in your future and the most strong-willed thing you can do is use the willpower that you have now when you’re strong, at your best moment, to be the best that you can be later when you’re at your weakest moment.

And this has a name: It’s called a Ulysses pact. Ulysses was going into Siren-infested waters. When you go into Siren-infested waters, you put wax in your ears so that you can’t hear what the Sirens are singing, because otherwise you’ll jump into the sea and drown. But Ulysses wanted to hear the Sirens. And so he came up with a compromise: He had his sailors tie him to the mast, so that when he heard the call of the Sirens, even though he would beg and gibber and ask them to untie him, so that he could jump into the sea, he would be bound to the mast and he would be able to sail through the infested waters.

This is a thing that economists talk about all the time, it’s a really critical part of how you build things that work well and fail well. Now, building a Web that is decentralized is a hard thing to do, and the reason that the web ceases to be decentralized periodically is because it’s very tempting to centralize things. There are lots of short term gains to be had from centralizing things and you want to be the best version of yourself, you want to protect your present best from your future worst.

The reason that the Web is closed today is that people just like you, the kind of people who went to Doug Engelbart’s demo in 1968, the kind of people who went to the first Hackers conference, people just like you, made compromises, that seemed like the right compromise to make at the time. And then they made another compromise. Little compromises, one after another.

And as humans, our sensory apparatus is really only capable of distinguishing relative differences, not absolute ones. And so when you make a little compromise, the next compromise that you make, you don’t compare it to the way you were when you were fresh and idealistic. You compare it to your current, “stained” state. And a little bit more stained hardly makes any difference. One compromise after another, and before you know it, you’re suing to make APIs copyrightable or you’re signing your name to a patent on one-click purchasing or you’re filing the headers off of a GPL library and hope no one looks too hard at your binaries. Or you’re putting a backdoor in your code for the NSA.

And the thing is: I am not better than the people who made those compromises. And you are not better than the people who made those compromises. The people who made those compromises discounted the future costs of the present benefits of some course of action, because it’s easy to understand present benefits and it’s hard to remember future costs.

You’re not weak if you eat a bag of Oreos in the middle of the night. You’re not weak if you save all of your friends’ mortgages by making a compromise when your business runs out of runway. You’re just human, and you’re experiencing that hyperbolic discounting of future costs because of that immediate reward in the here and now. If you want to make sure that you don’t eat a bag of Oreos in the middle of the night, make it more expensive to eat Oreos. Make it so that you have to get dressed and find your keys and figure out where the all-night grocery store is and drive there and buy a bag of Oreos. And that’s how you help yourself in the future, in that moment where you know what’s coming down the road.

The answer to not getting pressure from your bosses, your stakeholders, your investors or your members, to do the wrong thing later, when times are hard, is to take options off the table right now. This is a time-honored tradition in all kinds of economic realms. Union negotiators, before they go into a tough negotiation, will say: “I will resign as your negotiator, before I give up your pension.” And then they sit down across the table from the other side, and the other side says “It’s pensions or nothing”. And the union leaders say: “I hear what you’re saying. I am not empowered to trade away the pensions. I have to quit. They have to go elect a new negotiator, because I was elected contingent on not bargaining away the pensions. The pensions are off the table.”

Brewster has talked about this in the context of code, he suggested that we could build distributed technologies using the kinds of JavaScript libraries that are found in things like Google Docs and Google Mail, because no matter how much pressure is put on browser vendors, or on technology companies in general, the likelihood that they will disable Google Docs or Google Mail is very, very low. And so we can take Google Docs hostage and use it as an inhuman shield for our own projects.

The GPL does this. Once you write code, with the GPL it’s locked open, it’s irrevocably licensed for openness and no one can shut it down in the future by adding restrictive terms to the license. The reason the GPL works so well, the reason it became such a force for locking things open, is that it became indispensable. Companies that wanted to charge admission for commodity components like operating systems or file editors or compilers found themselves confronted with the reality that there’s a huge difference between even a small price and no price at all, or no monetary price. Eventually it just became absurd to think that you would instantiate a hundred million virtual machines for an eleventh of a second and get a license and a royalty for each one of them.

And at that point, GPL code became the only code that people used in cloud applications in any great volume, unless they actually were the company that published the operating system that wasn’t GPL’d. Communities coalesced around the idea of making free and open alternatives to these components: GNU/Linux, Open- and LibreOffice, git, and those projects benefited from a whole bunch of different motives, not always the purest ones. Sometimes it was programmers who really believed ethically in the project and funded their own work, sometimes talent was tight and companies wanted to attract programmers, and the way that they got them to come through the door is by saying: “We’ll give you some of your time to work on an ethical project and contribute code to it.”

Sometimes companies got tactical benefits by zeroing out the margins on their biggest competitor’s major revenue stream. So if you want to fight with Microsoft, just make Office free. And sometimes companies wanted to use but not sell commodity components. Maybe you want to run a cloud service but you don’t want to be in the operating system business, so you put a bunch of programmers on making Linux better for your business, without ever caring about getting money from the operating system. Instead you get it from the people who hire you to run their cloud.

Everyone of those entities, regardless of how they got into this situation of contributing to open projects, eventually faced hard times, because hard times are a fact of life. And systems that work well, but fail badly, are doomed to die in flames. The GPL is designed to fail well. It makes it impossible to hyperbolically discount the future costs of doing the wrong thing to gain an immediate benefit. When your investor or your acquisition suitor or your boss say “Screw your ethics, hippie, we need to make payroll”, you can just pull out the GPL and say: “Do you have any idea how badly we will be destroyed if we violate copyright law by violating the GPL?”

It’s why Microsoft was right to be freaked out about the GPL during the Free and Open Source wars. Microsoft’s coders were nerds like us, they fell in love with computers first, and became Microsoft employees second. They had benefited from freedom and openness, they had cated out BASIC programs, they had viewed sources, and they had an instinct towards openness. Combining that with the expedience of being able to use FLOSS, like not having to call a lawyer before you could be an engineer, and with the rational calculus, that if they made FLOSS, that when they eventually left Microsoft they could keep using the code that they had made there, meant that Microsoft coders and Microsoft were working for different goals. And the way they expressed that was in how they used and licensed their code.

This works so well that for a long time, nobody even knew if the GPL was enforceable, because nobody wanted to take the risk of suing and setting a bad precedent. It took years and years for us to find out in which jurisdictions we could enforce the GPL.

That brings me to another kind of computer regulation, something that has been bubbling along under the surface for a long time, at least since the Open Source wars, and that’s the use of Digital Rights Management (DRM) or Digital Restrictions Management, as some people call it. This is the technology that tries to control how you use your computer. The idea is that you have software on the computer that the user can’t override. If there is remote policy set on that computer that the user objects to, the computer rejects the user’s instruction in favor of the remote policy. It doesn’t work very well. It’s very hard to stop people who are sitting in front of a computer from figuring out how it works and changing how it works. We don’t keep safes in bank robbers’ living rooms, not even really good ones.

But we have a law that protects it, the Digital Millennium Copyright Act (DMCA), it’s been around since 1998 and it has lots of global equivalents like section 6 of the EUCD in Europe, implemented all across the EU member states. In New Zealand they tried to pass a version of the DMCA and there were uprisings and protests in the streets, they actually had to take the law off the books because it was so unpopular. And then the Christchurch earthquake hit and a member of parliament reintroduced it as a rider to the emergency relief bill to dig people out of the rubble. In Canada it’s Bill C-11 from 2011. And what it does is, it makes it a felony to tamper with those locks, a felony punishable by 500,000 dollars fine and five years in jail for a first offense. It makes it a felony to do security auditing of those locks and publish information about the flaws that are present in them or their systems.

This started off as a way to make sure that people who bought DVDs in India didn’t ship them to America. But it is a bad idea whose time has come. It has metastasized into every corner of our world. Because if you put just enough DRM around a product that you can invoke the law, then you can use other code, sitting behind the DRM, to control how the user uses that product, to extract more money. GM uses it to make sure that you can’t get diagnostics out of the car without getting a tool that they license to you, and that license comes with a term that says you have to buy parts from GM, and so all repair shops for GM that can access your diagnostic information have to buy their parts from GM and pay monopoly rents.

We see it in insulin pumps, we see it in thermostats and we see it in the “Internet of Things rectal thermometer”, which debuted at CES this year, which means we now have DRM restricted works in our asses. And it’s come to the web. It’s been lurking in the corners of the web for a long time. But now it’s being standardized at the World Wide Web Consortium (W3C) to something called Encrypted Media Extensions (EME). The idea of EME is that there is conduct that users want to engage in that no legislature in the world has banned, like PVR’ing their Netflix videos. But there are companies that would prefer that conduct not to be allowed. By wrapping the video with just enough DRM to invoke the DMCA, you can convert your commercial preference to not have PVRs (which are no more and no less legal than the VCR was when in 1984 the Supreme Court said you can record video off your TV) into something with the force of law, whose enforcement you can outsource to national governments.

What that means, is that if you want to do interoperability without permission, if you want to do adversarial interoperability, if you want to add a feature that the manufacturer or the value chain doesn’t want, if you want to encapsulate Gopher inside of the Web to launch a web browser with content form the first day, if you want to add an abstraction layer that lets you interoperate between two different video products so that you can shop between them and find out which one has the better deal, that conduct, which has never been banned by a legislature, becomes radioactively illegal.

It also means, that if you want to implement something that users can modify, you will find yourself at the sharp end of the law, because user modifiability for the core components of the system is antithetical to its goals of controlling user conduct. If there’s a bit you can toggle that says “Turn DRM off now”, then if you turn that bit off, the entire system ceases to work. But the worst part of all is that it makes browsers into no-go zones for security disclosures about vulnerabilities in the browser, because if you know about a vulnerability you could use it to weaken EME. But you could also use it to attack the user in other ways.

Adding DRM to browsers, standardizing DRM as an open standards organization, that’s a compromise. It’s a little compromise, because after all there’s already DRM in the world, and it’s a compromise that’s rational if you believe that DRM is inevitable. If you think that the choice is between DRM that’s fragmented or DRM that we get a say in, that we get to nudge into a better position, then it’s the right decision to make. You get to stick around and do something to make it less screwed up later, as opposed to being self-marginalized by refusing to participate at all.

But if DRM is inevitable, and I refuse to believe that it is, it’s because individually, all across the world, people who started out with the best of intentions made a million tiny compromises that took us to the point where DRM became inevitable, where the computers that are woven into our lives, with increasing intimacy and urgency, are designed to control us instead of being controlled by us. And the reasons those compromises were made is because each one of us thought that we were alone and that no one would have our back, that if we refuse to make the compromise, the next person down the road would, and that eventually, this would end up being implemented, so why not be the one who makes the compromise now.

They were good people, those who made those compromises. They were people who were no worse than you and probably better than me. They were acting unselfishly. They were trying to preserve the jobs and livelihoods and projects of people that they cared about. People who believed that others would not back their play, that doing the right thing would be self-limiting. When we’re alone, and when we believe we’re alone, we’re weak.

It’s not unusual to abuse standards bodies to attain some commercial goal. The normal practice is to get standards bodies to incorporate your patents into a standard, to ensure that if someone implements your standard, you get a nickel every time it ships. And that’s a great way to make rent off of something that becomes very popular. But the W3C was not armtwisted about adding patents back into standards. That’s because the W3C has the very best patents policy of any standards body in the world. When you come to the W3C to make a standard for the web, you promise not to use your patents against people who implement that standard. And the W3C was able to make that policy at a moment in which it was ascendant, in which people were clamoring to join it, in which it was the first moments of the Web and in which they were fresh.

The night they went on a diet, they were able to throw away all the Oreos in the house. They were where you are now, starting a project that people around the world were getting excited about, that was showing up on the front page of the New York Times. Now that policy has become the ironclad signifier of the W3C. What’s the W3C? It’s the open standards body that’s so open, that you don’t get to assert patents if you join it. And it remains intact.

How will we keep the DMCA from colonizing the Locked Open Web? How will we keep DRM from affecting all of us? By promising to have each others’ backs. By promising that by participating in the Open Web, we take the DMCA off the table. We take silencing security researchers, we take blocking new entrances to the market off the table now, when we are fresh, when we are insurgent, before we have turned from the pirates that we started out as into the admirals that some of us will become. We take that option off the table.

The EFF has proposed a version of this at the W3C and at other bodies, where we say: To be a member, you have to promise not to use the DMCA to aggress against those, who report security vulnerabilities in W3C standards, and people who make interoperable implementations of W3C standards. We’ve also proposed that to the FDA, as a condition of getting approval for medical implants, we’ve asked them to make companies promise in a binding way never to use the DMCA to aggress against security researchers. We’ve taken it to the FCC, and we’re taking it elsewhere. If you want to sign an open letter to the W3C endorsing this, email me: [email protected]

But we can go further than that, because Ulysses pacts are fantastically useful tools for locking stuff open. It’s not just the paper that you sign when you start your job, that takes a little bit of money out of your bank account every month for your 401k, although that works, too. The U.S. constitution is a Ulysses pact. It understands that lawmakers will be corrupted and it establishes a principal basis for repealing the laws that are inconsistent with the founding principles as well as a process for revising those principles as need be.

A society of laws is a lot harder to make work than a society of code or a society of people. If all you need to do is find someone who’s smart and kind and ask them to make all your decisions for you, you will spend a lot less time in meetings and a lot more time writing code. You won’t have to wrangle and flame or talk to lawyers. But it fails badly. We are all of us a mix of short-sighted and long-term, depending on the moment, our optimism, our urgency, our blood-sugar levels…

We must give each other moral support. Literal moral support, to uphold the morals of the Decentralized Web, by agreeing now what an open internet is and locking it open. When we do that, if we create binding agreements to take certain kinds of conduct off the table for anything that interoperates with or is part of what we’re building today, then our wise leaders tomorrow will never be pressurized to make those compromises, because if the compromise can’t be made, there is no point in leaning on them to make it.

We must set agreements and principles that allow us to resist the song of the Sirens in the future moments of desperation. And I want to propose two key principles, as foundational as life, liberty, and the pursuit of happiness or the First Amendment:

1) When a computer receives conflicting instructions from its owner and from a remote party, the owner always wins.

Systems should always be designed so that their owners can override remote instructions and should never be designed so that remote instructions can be executed if the owner objects to them. Once you create the capacity for remote parties to override the owners of computers, you set the stage for terrible things to come. Any time there is a power imbalance, expect the landlord, the teacher, the parent of the queer kid to enforce that power imbalance to allow them to remotely control the device that the person they have power over uses.

You will create security risks, because as soon as you have a mechanism that hides from the user, to run code on the user’s computers, anyone who hijacks that mechanism, either by presenting a secret warrant or by breaking into a vulnerability in the system, will be running in a privileged mode that is designed not to be interdicted by the user.

If you want to make sure that people show up at the door of the Distributed Web asking for backdoors, to the end of time, just build in an update mechanism that the user can’t stop. If you want to stop those backdoor requests from coming in, build in binary transparency, so that any time an update ships to one user that’s materially different from the other ones, everybody gets notified and your business never sells another product. Your board of directors will never pressurize you to go along with the NSA or the Chinese secret police to add a backdoor, if doing so will immediately shut down your business.

Throw away the Oreos now.

Let’s also talk about the Computer Fraud and Abuse Act. This is the act that says if you exceed your authorization on someone else’s computer, where that authorization can be defined as simply the terms of service that you click through on your way into using a common service, you commit a felony and can go to jail. Let’s throw that away, because it’s being used routinely to shut down people who discover security vulnerabilities in systems.

2) Disclosing true facts about the security of systems that we rely upon should never, ever be illegal.

We can have normative ways and persuasive ways of stopping people from disclosing recklessly, we can pay them bug bounties, we can have codes of conduct. But we must never, ever give corporations or the state the legal power to silence people who know true things about the systems we entrust our lives, safety, and privacy to.

These are the foundational principles. Computers obey their owners, true facts about risks to users are always legal to talk about. And I charge you to be hardliners on these principles, to be called fanatics. If they are not calling you puritans for these principles you are not pushing hard enough. If you computerize the world, and you don’t safeguard the users of computers form coercive control, history will not remember you as the heroes of progress, but as the blind handmaidens of future tyranny.

This internet, this distributed internet that we are building, the Redecentralization of the Internet, if it ever succeeds, will someday fail, because everything fails, because overwhelmingly, things are impermanent. What it gives rise to next, is a function of what we make today. There’s a parable about this:

The state of Roman metallurgy in the era of chariots, determined the wheel base of a Roman chariot, which determined the width of the Roman road, which determined the width of the contemporary road, because they were built atop the ruins of the Roman roads, which determined the wheel base of cars, which determined the widest size that you could have for a container that can move from a ship, to a truck, to a train, which determined the size of a train car, which determined the maximum size of the Space Shuttle’s disposable rockets.

Roman metallurgy prefigured the size of the Space Shuttle’s rockets.

This is not entirely true, there are historians who will explain the glosses in which it’s not true. But it is a parable about what happens when empires fall. Empires always fall. If you build a glorious empire, a good empire, an empire we can all be proud to live in, it will someday fall. You cannot lock it open forever. The best you can hope for is to wedge it open until it falls, and to leave behind the materials, the infrastructure that the people who reboot the civilization that comes after ours will use to make a better world.

A legacy of technology, norms and skills that embrace fairness, freedom, openness and transparency, is a commitment to care about your shared destiny with every person alive today and all the people who will live in the future.

Cory Doctorow: “How Stupid Laws and Benevolent Dictators can Ruin the Decentralized Web, too”
[Transcript by Jonke Suhr]

Categories: Blogs

Video: Guarding the Decentralized Web from its founders’ human frailty

Cory Doctorow - 2016, June 20 - 13:11

Earlier this month, I gave the afternoon keynote at the Internet Archive’s Decentralized Web Summit, speaking about how the people who are building a new kind of decentralized web can guard against their own future moments of weakness and prevent themselves from rationalizing away the kinds of compromises that led to the centralization of today’s web.

The talk was very well-received — it got a standing ovation — and I’ve heard from a lot of people about it since. The video was heretofore only available as a slice of a 9-hour Youtube archive of the day’s proceeding, but thanks to Jeff Kaplan and the Internet Archive, I’ve now got a cut of just my talk, which is on the Internet Archive for your downloading pleasure and mirrored at Youtube (There’s also an MP3).

Categories: Blogs

Image binarization – Otsu’s method

Matlab Image processing blog - 2016, June 14 - 14:35

In my 16-May-2016 post about image binarization, I talked about the new binarization functions in R2016a. Today I want to switch gears and talk about Otsu's method, one of the algorithms underlying imbinarize.

(A bonus feature of today's blog post is a demo of yyaxis, a new feature of MATLAB R2016a.)

Otsu's method is named for Nobuyuki Otsu, who published it in IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-9, no. 1, January 1979. At this time, researchers had already explored a variety of ways to choose a threshold automatically by examining the histogram of image pixel values. The basic idea is to look for two peaks, representing foreground and background pixel values, and pick a point in between the two peaks as the threshold value.

Here's a simple example using the coins image.

I = imread('coins.png'); imshow(I) imhist(I)

The function imbinarize calls otsuthresh to get a normalized threshold value.

t = otsuthresh(histcounts(I,-0.5:255.5)) t = 0.4941

Let's see where that threshold is.

hold on plot(255*[t t], ylim, 'r', 'LineWidth', 5) hold off

And here is the thresholded coins image.

imshow(imbinarize(I,t))

How does this threshold selection work? It is based on entirely on the set of histogram counts. To show the computation, I'll adopt the notation from the paper. Pixels can take on the set of values $i = 1,2,\ldots,L$. The histogram count for pixel value $i$ is $n_i$, and the associated probability is $p_i = n_i/N$, where $N$ is the number of image pixels. (I'm using the word probability here somewhat loosely, in the relative frequency sense.)

The thresholding task is formulated as the problem of dividing image pixels into two classes. $C_0$ is the set of pixels with values $[1,\ldots,k]$, and $C_1$ is the set of pixels with values in the range $[k+1,\ldots,L]$.

The overall class probabilities, $\omega_0$ and $\omega_1$, are:

$$\omega_0 = \sum_{i=1}^k p_i = \omega(k)$$

$$\omega_1 = \sum_{i=k+1}^L p_i = 1 - \omega_0(k)$$

The class means, $\mu_0$ and $\mu_1$, are the mean values of the pixels in $C_0$ and $C_1$. They are given by:

$$\mu_0 = \sum_{i=1}^k i p_i / \omega_0 = \mu(k)/\omega(k)$$

$$\mu_1 = \sum_{i=k+1}^L i p_i / \omega_1 = \frac{\mu_T - \mu(k)}{1 - \omega(k)}$$

where

$$\mu(k) = \sum_{i-1}^k i p_i$$

and $\mu_T$, the mean pixel value for the total image, is:

$$\mu_T = \sum_{i=1}^L i p_i.$$

The class variances, $\sigma_0^2$ and $\sigma_1^2$, are:

$$\sigma_0^2 = \sum_{i = 1}^k (i - \mu_0)^2 p_i / \omega_0$$

$$\sigma_1^2 = \sum_{i = k+1}^L (i - \mu_1)^2 pi / \omega_1.$$

Otsu mentions three measures of "good" class separability: within-class variance ($\lambda$), between-class variance ($\kappa$), and total variance ($\eta$). These are given by:

$$\lambda = \sigma_B^2$$

$$\kappa = \sigma_T^2/\sigma_W^2$$

$$\eta = \sigma_B^2/\sigma_T^2$$

where

$$\sigma_W^2 = \omega_0 \sigma_0^2 + \omega_1 \sigma_1^2$$

$$\sigma_B^2 = \omega_0 (\mu_0 - \mu_T)^2 + \omega_1 (\mu_1 - \mu_T)^2 = \omega_0 \omega_1 (\mu_1 - \mu_0)^2.$$

He goes on to point out that maximizing any of these criteria is equivalent to maximizing the others. Further, maximizing $\eta$ is the same as maximizing $\sigma_B^2$, which can be rewritten in terms of the selected threshold, $k$:

$$ \sigma_B^2(k) = \frac{[\mu_T \omega(k) - \mu(k)]^2}{\omega(k) [1 - \omega(k)]}.$$

The equation above is the heart of the algorithm. $\sigma_B^2$ is computed for all possible threshold values, and we choose as our threshold the value that maximizes it.

OK, that was a lot of equations, but there's really not that much involved in computing the key quantity, $\sigma_B^2(k)$. Here's what the computation looks like for the coins image.

counts = imhist(I); L = length(counts); p = counts / sum(counts); omega = cumsum(p); mu = cumsum(p .* (1:L)'); mu_t = mu(end); sigma_b_squared = (mu_t * omega - mu).^2 ./ (omega .* (1 - omega));

Using yyaxis, a new R2016a feature, let's plot the histogram and $\sigma_B^2$ together.

close all yyaxis left plot(counts) ylabel('Histogram') yyaxis right plot(sigma_b_squared) ylabel('\sigma_B^2') xlim([1 256])

Otsu's method chooses the place where $\sigma_B^2$ is the highest as the threshold.

[~,k] = max(sigma_b_squared); hold on plot([k k],ylim,'LineWidth',5) hold off

Here's another example. This is a public-domain light microscope image of Lily mitosis. (The original image is courtesy Andrew S. Bajer, University of Oregon, Eugene, OR. This version is slightly cropped.)

url = 'http://blogs.mathworks.com/steve/files/205.jpg'; I = rgb2gray(imread(url)); clf imshow(I) counts = imhist(I); L = length(counts); p = counts / sum(counts); omega = cumsum(p); mu = cumsum(p .* (1:L)'); mu_t = mu(end); sigma_b_squared = (mu_t * omega - mu).^2 ./ (omega .* (1 - omega)); close all yyaxis left plot(counts) ylabel('Histogram') yyaxis right plot(sigma_b_squared) ylabel('\sigma_B^2') [~,k] = max(sigma_b_squared); hold on plot([k k],ylim,'LineWidth',5) hold off xlim([1 256]) clf imshow(imbinarize(I)) title('Thresholded cell image')

If imbinarize handles this computation automatically, then why did we also provide a function called otsuthresh? The answer is that imbinarize takes an image as input, although Otsu's method does not require the original image, only the image's histogram. If you have a situation where you want to compute a threshold based only on a histogram, then you can call otsuthresh directly. That's why it is there.

To wrap up this week's discussion, I want to point out that a couple of blog readers recommended something called the Triangle method for automatic gray-scale image thresholding. If you want to try this for yourself, there is an implementation on the File Exchange. I have not had a chance yet to experiment with it.

Next time I'll talk about the algorithm used by imbinarize for locally adaptive thresholding.

\n'); d.write(code_string); // Add copyright line at the bottom if specified. if (copyright.length > 0) { d.writeln(''); d.writeln('%%'); if (copyright.length > 0) { d.writeln('% _' + copyright + '_'); } } d.write('\n'); d.title = title + ' (MATLAB code)'; d.close(); } -->


Get the MATLAB code (requires JavaScript)

Published with MATLAB® R2016a

about image binarization, I talked about the new % binarization functions in R2016a. Today I want to switch gears and talk % about Otsu's method, one of the algorithms underlying % . % % (A bonus feature of today's blog post is a demo of % , a new % feature of MATLAB R2016a.) % % Otsu's method is named for Nobuyuki Otsu, who published it in _IEEE % Transactions on Systems, Man, and Cybernetics_, vol. SMC-9, no. 1, January % 1979. At this time, researchers had already explored a variety of ways to % choose a threshold automatically by examining the histogram of image pixel % values. The basic idea is to look for two peaks, representing foreground % and background pixel values, and pick a point in between the two peaks as % the threshold value. % % Here's a simple example using the coins image. I = imread('coins.png'); imshow(I) %% imhist(I) %% % The function |imbinarize| calls |otsuthresh| to get a normalized threshold % value. t = otsuthresh(histcounts(I,-0.5:255.5)) %% % Let's see where that threshold is. hold on plot(255*[t t], ylim, 'r', 'LineWidth', 5) hold off %% % And here is the thresholded coins image. imshow(imbinarize(I,t)) %% % How does this threshold selection work? It is based on entirely on the set % of histogram counts. To show the computation, I'll adopt the notation from % the paper. Pixels can take on the set of values $i = 1,2,\ldots,L$. The % histogram count for pixel value $i$ is $n_i$, and the associated % probability is $p_i = n_i/N$, where $N$ is the number of image pixels. % (I'm using the word _probability_ here somewhat loosely, in the relative % frequency sense.) % % The thresholding task is formulated as the problem of dividing image % pixels into two classes. $C_0$ is the set of pixels with values % $[1,\ldots,k]$, and $C_1$ is the set of pixels with values in the range % $[k+1,\ldots,L]$. % % The overall class probabilities, $\omega_0$ and $\omega_1$, are: % % $$\omega_0 = \sum_{i=1}^k p_i = \omega(k)$$ % % $$\omega_1 = \sum_{i=k+1}^L p_i = 1 = \omega(k)$$ % % The class means, $\mu_0$ and $\mu_1$, are the mean values of the pixels in % $C_0$ and $C_1$. They are given by: % % $$\mu_0 = \sum_{i=1}^k i p_i / \omega_0 = \mu(k)/\omega(k)$$ % % $$\mu_1 = \sum_{i=k+1}^L i p_i / \omega_1 = \frac{\mu_T - \mu(k)}{1 - % \omega(k)}$$ % % where % % $$\mu(k) = \sum_{i-1}^k i p_i$$ % % and $\mu_T$, the mean pixel value for the total image, is: % % $$\mu_T = \sum_{i=1}^L i p_i.$$ % % The class variances, $\sigma_0^2$ and $\sigma_1^2$, are: % % $$\sigma_0^2 = \sum_{i = 1}^k (i - \mu_0)^2 p_i / \omega_0$$ % % $$\sigma_1^2 = \sum_{i = k+1}^L (i - \mu_1)^2 pi / \omega_1.$$ % % Otsu mentions three measures of "good" class separability: within-class % variance ($\lambda$), between-class variance ($\kappa$), and total % variance ($\eta$). These are given by: % % $$\lambda = \sigma_B^2$$ % % $$\kappa = \sigma_T^2/\sigma_W^2$$ % % $$\eta = \sigma_B^2/\sigma_T^2$$ % % where % % $$\sigma_W^2 = \omega_0 \sigma_0^2 + \omega_1 \sigma_1^2$$ % % $$\sigma_B^2 = \omega_0 (\mu_0 - \mu_T)^2 + \omega_1 (\mu_1 - \mu_T)^2 = % \omega_0 \omega_1 (\mu_1 - \mu_0)^2.$$ % % He goes on to point out that maximizing any of these criteria is % equivalent to maximizing the others. Further, maximizing $\eta$ is the % same as maximizing $\sigma_B^2$, which can be rewritten in terms of the % selected threshold, $k$: % % $$ \sigma_B^2(k) = \frac{[\mu_T \omega(k) - \mu(k)]^2}{\omega(k) [1 - % \omega(k)]}.$$ % % The equation above is the heart of the algorithm. $\sigma_B^2$ is computed % for all possible threshold values, and we choose as our threshold the % value that maximizes it. % % OK, that was a lot of equations, but there's really not that much involved % in computing the key quantity, $\sigma_B^2(k)$. Here's what the % computation looks like for the coins image. counts = imhist(I); L = length(counts); p = counts / sum(counts); omega = cumsum(p); mu = cumsum(p .* (1:L)'); mu_t = mu(end); sigma_b_squared = (mu_t * omega - mu).^2 ./ (omega .* (1 - omega)); %% % Using |yyaxis|, a new R2016a feature, let's plot the histogram and % $\sigma_B^2$ together. close all yyaxis left plot(counts) ylabel('Histogram') yyaxis right plot(sigma_b_squared) ylabel('\sigma_B^2') xlim([1 256]) %% % Otsu's method chooses the place where $\sigma_B^2$ is the highest as the % threshold. [~,k] = max(sigma_b_squared); hold on plot([k k],ylim,'LineWidth',5) hold off %% % Here's another example. This is a public-domain light microscope image of % Lily mitosis. (The is courtesy Andrew S. Bajer, University of Oregon, Eugene, OR. This % version is slightly cropped.) url = 'http://blogs.mathworks.com/steve/files/205.jpg'; I = rgb2gray(imread(url)); clf imshow(I) %% counts = imhist(I); L = length(counts); p = counts / sum(counts); omega = cumsum(p); mu = cumsum(p .* (1:L)'); mu_t = mu(end); sigma_b_squared = (mu_t * omega - mu).^2 ./ (omega .* (1 - omega)); close all yyaxis left plot(counts) ylabel('Histogram') yyaxis right plot(sigma_b_squared) ylabel('\sigma_B^2') [~,k] = max(sigma_b_squared); hold on plot([k k],ylim,'LineWidth',5) hold off xlim([1 256]) %% clf imshow(imbinarize(I)) title('Thresholded cell image') %% % If |imbinarize| handles this computation automatically, then why did we % also provide a function called % ? % The answer is that |imbinarize| takes an image as input, although Otsu's % method does not require the original image, only the image's histogram. If % you have a situation where you want to compute a threshold based *only* on % a histogram, then you can call |otsuthresh| directly. That's why it is % there. %% % To wrap up this week's discussion, I want to point out that a couple of % blog readers recommended something called the _Triangle method_ for % automatic gray-scale image thresholding. If you want to try this for % yourself, there is an implementation on the File Exchange. I have not had % a chance yet to experiment with it. % % Next time I'll talk about the algorithm used by % for locally % adaptive thresholding. ##### SOURCE END ##### 7a6499d359984c5bbbbf01f04253910a -->

Categories: Blogs

How we will keep the Decentralized Web decentralized: my talk from the Decentralized Web Summit

Cory Doctorow - 2016, June 9 - 10:24

At yesterday’s Internet Archive Decentralized Web Summit, the afternoon was given over to questions of security and policy.

I gave the opening talk, “How Stupid Laws and Benevolent Dictators can Ruin the Decentralized Web, too,” which was about “Ulysses pacts“: bargains you make with yourself when your willpower is strong to prevent giving into temptation later when you are tired or demoralized, and how these have benefited the web to date, and how new, better ones can protect the decentralized web of the future.

EFF’s Jeremy Gillula and Noah Swartz — who were there to present Certbot, a tool that produces free cryptographic certificates — wrote up the afternoon, including my talk, and did a good job summarizing it:

He called on the audience to act now to make a Ulysses pact for the decentralized web, because everything eventually fails or falls on hard times. If we want to make sure that the principles and values we hold dear survive, we need to design the systems that embody those principles so that they can’t be compromised of weakened. In other words, we need to build things now so that five or ten or twenty years from now, when what we’ve built is successful and someone asks us to add a backdoor or insert malware or track our users, it simply won’t be possible (for either technological or legal or monetary reasons)—no matter how much outside pressure we’re under.

After all, “The reason the web is closed today is because…people just like you made compromises that seemed like the right compromise to make at the time. And then they made another compromise, a little one. And another one.” He continued, pointing out that “We are, all of us, a mix of short-sighted and long-term…We must give each other moral support. Literal support to uphold the morals of the decentralized web, by agreeing now on what an open decentralized web is.” Only by doing this will we be able to resist the siren song of re-centralization.

And what sort of principles should we agree to? Cory suggests two. First, when a computer receives conflicting instructions from its owner and from a remote party, the owner’s wishes should always take precedence. In other words, no DRM (that means you, W3C). Second, disclosing true facts about the security of systems that we rely upon should never ever be illegal. In other words, we need to work to abolish things like the DMCA, which create legal uncertainty for security researchers disclosing vulnerabilities in systems locked behind DRM. The crowd’s response to this passionate call to action? A standing ovation.

Values, Governance, and What Comes Next: Afternoon Sessions at the Decentralized Web Summit

[Jeremy Gillula and Noah Swartz/EFF]

Categories: Blogs

You are not a wallet: complaining considered helpful

Cory Doctorow - 2016, June 7 - 12:49


My new Guardian column, It’s your duty to complain – that’s how companies improve, is a rebuttal to those who greet public complaints about businesses’ actions with, “Well, just don’t buy from them, then.”


This idea posits that your role in the market is to be a kind of ambulatory wallet, whose only options are to buy, or not to buy. But not only does complaining sometimes solve your problems, it also warns others away from bad decisions, helping better companies thrive.

Finally, some business conduct isn’t just bad, it’s wrong, whether that’s discrimination, or unfair trading practices, and in those cases, you not only have the right to choose to do business elsewhere, you also have the right to force that company to change that way it operates, and the people who’ve taken on that challenge have done us all a service, and are the reason that we’re not all dying in a fireball every time our cars get rear-ended.

Whenever a complaint comes up about electronic media – games, ebooks, music, movies – and the ways their publishers restrict playback on devices, the “don’t buy it then” squad starts telling you to take your business elsewhere.

Copyright is a deal between the people and rightsholders. Rightsholders get a copyright – an expansive, long-enduring right to control most copying, display, adaptation and performance – when they create something new and fix it in a tangible medium. All the rights not set out in copyright remain in the public’s hands. That means you can’t sell a book with a license agreement that says, “By buying this book whose copyright expires next week, you agree that you will behave as though the copyright expires in the year 2100.” You can’t say, “By buying this book, you agree to vote for Donald Trump,” or “You agree not to let black people or Jews or women read it.”

You – the person reading that book, playing that game, listening to that music – have rights over that work beyond the right to buy or not buy it. You are more than just your wallet.

You have the right to enjoy the media you buy, even when you travel abroad. You have the right to be private in your enjoyment of that media. You have the right to engage in every activity the law doesn’t prohibit.

When those rights are taken away, you have been wronged. You are still wronged, even when you stop buying from the company that wronged you – and that’s if you have the choice to find a new supplier; if it’s your ISP who’s doing the bad stuff, chances are there aren’t any better ISPs you can switch to. You have options, like contacting a government agency such as the Office of Fair Trading and the Federal Trade Commission, or consumer rights organisation like Which? in the UK and Consumers Union in the USA. You have the option of contacting a lawyer.


It’s your duty to complain – that’s how companies improve
[Cory Doctorow/The Guardian]


(Image: Pixabay, PD)

Categories: Blogs

How security and privacy pros can help save the web from legal threats over vulnerability disclosure

Cory Doctorow - 2016, June 1 - 09:56

I have a new op-ed in today’s Privacy Tech, the in-house organ of the International Association of Privacy Professionals, about the risks to security and privacy from the World Wide Web Consortium’s DRM project, and how privacy and security pros can help protect people who discover vulnerabilities in browsers from legal aggression.

I’ve got an open letter to the W3C asking it to extend its existing nonaggression policy — which prohibits members from using patents to threaten those who implement web standards — to cover the weird, dangerous rights conferred by laws like the DMCA, which let companies threaten security researchers who come forward with disclosures of dangerous product defects.

If you’re a privacy or security pro and you want to support this initiative, email me, along with the country you’d like listed with your name, and your institutional affiliation (if any).

Last summer, the U.S. Copyright Office solicited comments on problems with DMCA 1201, and heard from some of the nation’s most respected security researchers, from Bruce Schneier to Steve Bellovin (formerly chief technologist at the Federal Trade Commission, now the first technology scholar for the Privacy and Civil Liberties Oversight Board), and Ed Felten (now White House Deputy Chief Technology Officer).

The researchers spoke as one to say that the DMCA has chilled them from reporting on flaws in technologies from cars and tractors to medical implants to voting machines.

The W3C’s decision to standardize DRM puts it on a collision course with this legal system. The U.S. Trade Representative has exported versions of the DMCA to most of the U.S.’s trading partners, meaning that web users all over the world face the risk that the flaws in their browsers will go unreported because researchers fear retaliation from vendors who want to avert commercial embarrassment (and even legal liability) when those flaws come to light.

EFF would prefer that the W3C not standardize DRM at all: anything that makes it easier for companies to attack security researchers is not good for the open web. But since the W3C rejected that proposal, we’ve offered a compromise: asking the W3C to extend its existing policy on IPRs to protect security researchers.

How you can help white hat security researchers [Privacy Tech/IAPP]

Categories: Blogs

Revealed: the amazing cover for Walkaway, my first adult novel since 2009

Cory Doctorow - 2016, May 26 - 06:54

Next April, Tor Books will publish Walkaway, the first novel I’ve written specifically for adults since 2009; it’s scheduled to be their lead title for the season and they’ve hired the brilliant designer Will Staehle (Yiddish Policeman’s Union, Darker Shade of Magic) for the cover, which Tor has just revealed.


Staehle’s cover features a die-cut dustjacket that offers a peek at the design printed on the boards beneath and highlights the blurb from Edward Snowden (!).

I’ll be going out on a 25-city tour when the book comes out — I hope to see you!

The book was originally titled “Utopia” and you can read about it here; here’s Tor’s summary of the book:

Hubert, Seth, and their ultra-rich heiress friend Natalie are getting a little old to hang out at the “Communist parties,” techno-raveups in abandoned industrial spaces, full of insta-printed drugs and toys. And Natalie was finished, years ago, with her overcontrolling zillionaire dad.

And now that anyone can manufacture food, clothing, shelter with equipment comparable to a computer printer, there seems to be little reason to to stick with the world of rules and jobs. So, like hundreds of thousands of others in the mid-21st century, the three of them…walk away.

Mind you, it’s still dangerous out there. Much of the countryside is wrecked by climate change, and predators are with us always. Yet when the initial pioneer walkaways flourish, more people join them. Then the walkaways discover the one thing the ultra-rich have never been able to buy: how to beat death.

Now it’s war—a war that will turn the world upside down.

Fascinating, moving, and darkly humorous, Walkaway is a multi-generation SF thriller about the wrenching changes of the next hundred years…and the very human people who will live their consequences.

Revealing the Cover for Cory Doctorow’s Walkaway

[Tor.com]

Categories: Blogs

Image binarization – new R2016a functions

Matlab Image processing blog - 2016, May 16 - 11:20

In my 09-May-2016 post, I described the Image Processing Toolbox functions im2bw and graythresh, which have been in the product for a long time. I also identified a few weaknesses in the functional designs:

  • The function im2bw uses a fixed threshold value (LEVEL) of 0.5 by default. Using graythresh to determine the threshold value automatically would be a more useful behavior most of the time.
  • If you don't need to save the value of LEVEL, then you end up calling the functions in a slightly awkward way, passing the input image to each of the two functions: bw = im2bw(I,graythresh(I))
  • Although Otsu's method really only needs to know the image histogram, you have to pass in the image itself to the graythresh function. This is awkward for some use cases, such as using the collective histogram of multiple images in a dataset to compute a single threshold.
  • Some users wanted to control the number of histogram bins used by graythresh, which does not have that as an option. (I forgot to mention this item in my previous post.)
  • There was no locally adaptive thresholding method in the toolbox.

For all of these reasons, the Image Processing Toolbox development undertook a redesign of binarization functionality for the R2016a release. The functional designs are different and the capabilities have been extended. We now encourage the use of a new family of functions:

Binarization using an automatically computed threshold value is now simpler. Instead of two function calls, im2bw(I,graythresh(I)), you can do it with one, imbinarize(I).

I = imread('cameraman.tif'); imshow(I) xlabel('Cameraman image courtesy of MIT') bw = imbinarize(I); imshowpair(I,bw,'montage')

In addition to global thresholding, imbinarize can also do locally adaptive thresholding. Here is an example using an image with a mild illumination gradient from top to bottom.

I = imread('rice.png'); bw = imbinarize(I); imshowpair(I,bw,'montage') title('Original and global threshold')

You can see that the rice grains at the bottom of the image are imperfectly segmented because they are in a darker portion of the image. Now switch to an adaptive threshold.

bw = imbinarize(I,'adaptive'); imshowpair(I,bw,'montage') title('Original and adaptive threshold')

Here is a more extreme example of nonuniform illumination.

I = imread('printedtext.png'); imshow(I) title('Original image') bw = imbinarize(I); imshow(bw) title('Global threshold')

Let's see how using an adaptive threshold can improve the results. Before jumping into it, though, notice that the foreground pixels in this image are darker than the background, which is the opposite of the rice grains image above. The adaptive method works better if it knows whether to look for foreground pixels that are brighter or darker than the background. The optional parameter 'ForegroundPolarity' lets is specify that.

bw = imbinarize(I,'adaptive','ForegroundPolarity','dark'); imshow(bw) title('Adaptive threshold')

The new functions otsuthresh and adaptthresh are for those who want to have more fine-grained control over the algorithms underlying the global and adaptive thresholding behavior of imbinarize. I'll talk about them next time.

\n'); d.write(code_string); // Add copyright line at the bottom if specified. if (copyright.length > 0) { d.writeln(''); d.writeln('%%'); if (copyright.length > 0) { d.writeln('% _' + copyright + '_'); } } d.write('\n'); d.title = title + ' (MATLAB code)'; d.close(); } -->


Get the MATLAB code (requires JavaScript)

Published with MATLAB® R2016a

, I described the Image Processing Toolbox functions % |im2bw| and |graythresh|, which have been in the product for a long time. % I also identified a few weaknesses in the functional designs: % % * The function |im2bw| uses a fixed threshold value (|LEVEL|) of 0.5 by % default. Using |graythresh| to determine the threshold value automatically % would be a more useful behavior most of the time. % * If you don't need to save the value of LEVEL, then you end up calling % the functions in a slightly awkward way, passing the input image to each % of the two functions: bw = im2bw(I,graythresh(I)) % * Although Otsu's method really only needs to know the image histogram, % you have to pass in the image itself to the graythresh function. This is % awkward for some use cases, such as using the collective histogram of % multiple images in a dataset to compute a single threshold. % * Some users wanted to control the number of histogram bins used by % |graythresh|, which does not have that as an option. (I forgot to mention % this item in my previous post.) % * There was no locally adaptive thresholding method in the toolbox. % % For all of these reasons, the Image Processing Toolbox development % undertook a redesign of binarization functionality for the R2016a release. % The functional designs are different and the capabilities have been % extended. We now encourage the use of a new family of functions: % % * % * % * % % Binarization using an automatically computed threshold value is now % simpler. Instead of two function calls, |im2bw(I,graythresh(I))|, you can % do it with one, |imbinarize(I)|. I = imread('cameraman.tif'); imshow(I) xlabel('Cameraman image courtesy of MIT') %% bw = imbinarize(I); imshowpair(I,bw,'montage') %% % In addition to global thresholding, |imbinarize| can also do locally % adaptive thresholding. Here is an example using an image with a mild % illumination gradient from top to bottom. I = imread('rice.png'); bw = imbinarize(I); imshowpair(I,bw,'montage') title('Original and global threshold') %% % You can see that the rice grains at the bottom of the image are % imperfectly segmented because they are in a darker portion of the image. % Now switch to an adaptive threshold. bw = imbinarize(I,'adaptive'); imshowpair(I,bw,'montage') title('Original and adaptive threshold') %% % Here is a more extreme example of nonuniform illumination. I = imread('printedtext.png'); imshow(I) title('Original image') %% bw = imbinarize(I); imshow(bw) title('Global threshold') %% % Let's see how using an adaptive threshold can improve the results. Before % jumping into it, though, notice that the foreground pixels in this image % are darker than the background, which is the opposite of the rice grains % image above. The adaptive method works better if it knows whether to look % for foreground pixels that are brighter or darker than the background. The % optional parameter |'ForegroundPolarity'| lets is specify that. bw = imbinarize(I,'adaptive','ForegroundPolarity','dark'); imshow(bw) title('Adaptive threshold') %% % The new functions |otsuthresh| and |adaptthresh| are for those who want to % have more fine-grained control over the algorithms underlying the global % and adaptive thresholding behavior of |imbinarize|. I'll talk about them % next time. ##### SOURCE END ##### 6c4054d20ae2423ea15cffeaedaf60da -->

Categories: Blogs

Actor Tailor Soldier Spy

Casey McKinnon - 2016, May 16 - 10:49




I did a quick shoot with the Headshot Truck last week to refresh my headshots and get some photos of character types. My agent was enthusiastic about getting a powerful shot in a suit for roles like manipulative politician, lawyer, and agent (of the FBI, of real estate, of A.C.R.O.N.Y.M.S., etc.). The second look she wanted was a strong army look, which could also work great for roles like resistance fighter, local militia member, or apocalypse survivor. And, thanks to the efficient photographer in the Headshot Truck, and my own over-preparedness, I was able to sneak in a third look...  a somewhat period appropriate (and somewhat inappropriate) girl next door type.

I had a good experience with the Headshot Truck, and I may choose to visit them in the future for another look; perhaps doctor/scientist, nerdy intellectual, or Shakespearean ingenue? We shall see. In the meantime, I'm very pleased with the results and I hope they serve their purpose well.

Categories: Blogs

O’Reilly Hardware Podcast on the risks to the open Web and the future of the Internet of Things

Cory Doctorow - 2016, May 11 - 10:36

I appeared on the O’Reilly Hardware Podcast this week (MP3, talking about the way that DRM has crept into all our smart devices, which compromises privacy, security and competition.

In this episode of the Hardware podcast, we talk with writer and digital rights activist Cory Doctorow. He’s recently rejoined the Electronic Frontier Foundation to fight a World Wide Web Consortium proposal that would add DRM to the core specification for HTML. When we recorded this episode with Cory, the W3C had just overruled the EFF’s objection. The result, he says, is that “we are locking innovation out of the Web.”

“It is illegal to report security vulnerabilities in a DRM,” Doctorow says. “[DRM] is making it illegal to tell people when the devices they depend upon for their very lives are unsuited for that purpose.”
Get O’Reilly’s weekly hardware newsletter

In our “Tools” segment, Doctorow tells us about tools that can be used for privacy and encryption, including the EFF surveillance self-defense kit, and Wickr, an encrypted messaging service that allows for an expiration date on shared messages and photos. “We need a tool that’s so easy your boss can use it,” he says.

Cory Doctorow on losing the open Web [O’Reilly Hardware Podcast]

Categories: Blogs

Peace in Our Time: how publishers, libraries and writers could work together

Cory Doctorow - 2016, May 9 - 17:33


Publishing is in a weird place: ebook sales are stagnating; publishing has shrunk to five major publishers; libraries and publishers are at each others’ throats over ebook pricing; and major writers’ groups are up in arms over ebook royalties, and, of course, we only have one major book retailer left — what is to be done?


In my new Locus Magazine column, “Peace in Our Time,” I propose a pair of software projects that could bring all together writers, publishers and libraries to increase competition, give publishers the market intelligence they need to sell more books, triple writers’ ebook royalties, and sell more ebooks to libraries, on much fairer terms.

The first project is a free/open version of Overdrive, the software that publishers insist that libraries use for ebook circulation. A free/open version, collectively created and maintained by the library community, would create a source of data that publishers could use to compete with Amazon, their biggest frenemy, while still protecting patron privacy. The publishers’ quid-pro-quo for this data would be an end to the practice of gouging libraries on ebook prices, leaving them with more capital to buy more books.

The second project is a federated ebook store for writers, that would allow writers to act as retailers for their publishers, selling their own books and keeping the retailer’s share in addition to their traditional royalty: a move that would increase the writer’s share by 300%, without costing the publishers a penny. Writer-operated ebook stores, spread all over the Web but searchable from central portals, do not violate the publishers’ agreements with Amazon, but they do create new sales category: “fair trade ebooks,” whose sale gives the writers you love the money to feed their families and write more books — without costing you anything extra.

Amazon knows, in realtime, how publishers’ books are performing. It knows who is buying them, where they’re buying them, where they’re reading them, what they searched for before buying them, what other books they buy at the same time, what books they buy before and after, whether they read them, how fast they read them, and whether they finish them.

Amazon discloses almost none of this to the publishers, and what information they do disclose to the publishers (the sales data for the publishers’ own books, atomized, without data-mineable associations) they disclose after 30 days, or 90 days, or 180 days. Publishers try to fill in the gaps by buying their own data back from the remaining print booksellers, through subscriptions to point-of-sale databases that have limited relevance to e-book performance.

There is only one database of e-book data that is remotely comparable to the data that Amazon mines to stay ahead of the publishers: e-book circulation data from public libraries. This data is not as deep as Ama­zon’s – thankfully, since it’s creepy and terrible that Amazon knows about your reading habits in all this depth, and it’s right and fitting that libraries have refused to turn on that kind of surveillance for their own e-book circulation.

Peace in Our Time [Cory Doctorow/Locus]

Categories: Blogs

Image binarization – im2bw and graythresh

Matlab Image processing blog - 2016, May 9 - 11:41

As I promised last time, I'm writing a series about functional designs for image binarization in the Image Processing Toolbox. Today I'll start by talking about im2bw and graythresh, two functions that have been in the product for a long time.

The function im2bw appeared in Image Processing Toolbox version 1.0, which shipped in early fall 1993. That was about the time I interviewed for my job at MathWorks. (I was a beta tester of version 1.0.)

Here is the help text from that early function:

%IM2BW Convert image to black and white by thresholding. % BW = IM2BW(X,MAP,LEVEL) converts the indexed image X with % colormap MAP to a black and white intensity image BW. % BW is 0 (black) for all pixels with luminance less % than LEVEL and 1 (white) for all other values. % % BW = IM2BW(I,LEVEL) converts the gray level intensity image % I to black and white. BW is 0 (black) for all pixels with % value less than LEVEL and 1 (white) for all other values. % % BW = IM2BW(R,G,B,LEVEL) converts the RGB image to black % and white. BW is 0 (black) for all pixels with luminance % less than LEVEL and 1 (white) for all other values. % % See also IND2GRAY, RGB2GRAY.

At that time, the prefix "im" in the function name meant that the function could take more than one image type (indexed, intensity, RGB).

At this point in the early history of MATLAB, the language really only had one type. Everything in MATLAB was a double-precision matrix. This affected the early functional design in two ways. First, the toolbox established [0,1] as the conventional dynamic range for gray-scale images. This choice was influenced by the mathematical orientation of MATLAB as well as the fact that there was no one-byte-per-element data type. The second impact on functional design can be seen in the syntax IM2BW(R,G,B,LEVEL). RGB (or truecolor) images had to be represented with three different matrices, one for each color component. I really don't miss those days!

Here are two examples, an indexed image and a gray-scale image.

[X,map] = imread('trees.tif'); imshow(X,map); title('Original indexed image') bw = im2bw(X,map,0.5); imshow(bw) title('Output of im2bw') I = imread('cameraman.tif'); imshow(I) title('Original gray-scale image') xlabel('Cameraman image courtesy of MIT') bw = im2bw(I,0.5); imshow(bw) title('Output of im2bw')

It turns out that im2bw had other syntaxes that did not appear in the documentation. Specifically, the LEVEL argument could be omitted. Here is relevant code fragment:

if isempty(level), % Get level from user level = 0.5; % Use default for now end

Experienced software developers will be amused by the code comment above, "Use default for now". This indicates that the developer intended to go back and do something else here before shipping but never did. Anyway, you can see that a LEVEL of 0.5 is used if you don't specify it yourself.

MATLAB 5 and Image Processing Toolbox version 2.0 shipped in early 1998. These were very big releases for both products. MATLAB 5 featured multidimensional arrays, cell arrays, structs, and many other features. MATLAB 5 also had something else that was big for image processing: numeric arrays that weren't double precision. At the time, you could make uint8, int8, uint16, int16, uint32, int32, and single arrays. However, there was almost no functional support or operator support these arrays. The capability was so limited that we didn't even mention it in the MATLAB 5 documentation.

Image Processing Toolbox 2.0 provided support for (and documented) uint8 arrays. The other types went undocumented and largely unsupported in both MATLAB and the toolbox for a while longer.

Multidimensional array and uint8 support affected almost every function in the toolbox, so version 2.0 was a complex release, especially with respect to compatibility. We wanted to be able to handle uint8 and multidimensional arrays smoothly, to the degree possible, with existing user code.

One of the design questions that arose during this transition concerned the LEVEL argument for im2bw. Should the interpretation of LEVEL be different, depending on the data type of the input image? To increase the chance that existing user code would work as expected without change, even if the image data type changed from double to uint8, we adopted the convention that LEVEL would continue to be specified in the range [0,1], independent of the input image data type. That is, a LEVEL of 0.5 has the same visual effect for a double input image as it does for a uint8 input image.

Now, image processing as a discipline is infamous for its "magic numbers," such as threshold values like LEVEL, that need to be tweaked for every data set. Sometime around 1999 or 2000, we reviewed the literature about algorithms to compute thresholds automatically. There were only a handful that seemed to work reasonably well for a broad class of images, and one in particular seemed to be both popular and computationally efficient: N. Otsu, "A Threshold Selection Method from Gray-Level Histograms," IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, 1979, pp. 62-66. This is the one we chose to implement for the toolbox. It is the algorithm under the hood of the function graythresh, which was introduced in version 3.0 of the toolbox in 2001.

The function graythresh was designed to work well with the function im2bw. It takes a gray-scale image and returns the same normalized LEVEL value that im2bw uses. For example:

level = graythresh(I) level = 0.3451 bw = im2bw(I,level); imshow(bw) title('Level computed by graythresh')

Aside from multilevel thresholding introduced in R2012b, this has been the state of image binarization in the Image Processing Toolbox for about the last 15 years.

There are a few weaknesses in this set of functional designs, though, and these weaknesses eventually led the development to consider an overhaul.

  • Most people felt that the value returned by graythresh would have been a better default LEVEL than 0.5.
  • If you don't need to save the value of LEVEL, then you end up calling the functions in a slightly awkward way, passing the input image to each of the two functions: bw = im2bw(I,graythresh(I))
  • Although Otsu's method really only needs to know the image histogram, you have to pass in the image itself to the graythresh function. This is awkward for some use cases, such as using the collective histogram of multiple images in a dataset to compute a single threshold.
  • There was no locally adaptive thresholding method in the toolbox.

Next time I plan to discuss the new image binarization functional designs in R2016a.

Also, thanks very much to ez, PierreC, Matt, and Mark for their comments on the previous post.

\n'); d.write(code_string); // Add copyright line at the bottom if specified. if (copyright.length > 0) { d.writeln(''); d.writeln('%%'); if (copyright.length > 0) { d.writeln('% _' + copyright + '_'); } } d.write('\n'); d.title = title + ' (MATLAB code)'; d.close(); } -->


Get the MATLAB code (requires JavaScript)

Published with MATLAB® R2016a

, I'm writing a series about functional designs for image % binarization in the Image Processing Toolbox. Today I'll start by talking % about and % , % two functions that have been in the product for a long time. % % The function |im2bw| appeared in Image Processing Toolbox version 1.0, % which shipped in early fall 1993. That was about the time I interviewed % for my job at MathWorks. (I was a beta tester of version 1.0.) % % Here is the help text from that early function: % % %IM2BW Convert image to black and white by thresholding. % % BW = IM2BW(X,MAP,LEVEL) converts the indexed image X with % % colormap MAP to a black and white intensity image BW. % % BW is 0 (black) for all pixels with luminance less % % than LEVEL and 1 (white) for all other values. % % % % BW = IM2BW(I,LEVEL) converts the gray level intensity image % % I to black and white. BW is 0 (black) for all pixels with % % value less than LEVEL and 1 (white) for all other values. % % % % BW = IM2BW(R,G,B,LEVEL) converts the RGB image to black % % and white. BW is 0 (black) for all pixels with luminance % % less than LEVEL and 1 (white) for all other values. % % % % See also IND2GRAY, RGB2GRAY. % % At that time, the prefix "im" in the function name meant that the function % could take more than one image type (indexed, intensity, RGB). % % At this point in the early history of MATLAB, the language really only had % one type. Everything in MATLAB was a double-precision matrix. This affected % the early functional design in two ways. First, the toolbox established [0,1] % as the conventional dynamic range for gray-scale images. This choice was % influenced by the mathematical orientation of MATLAB as well as the fact % that there was no one-byte-per-element data type. The second impact on % functional design can be seen in the syntax IM2BW(R,G,B,LEVEL). RGB (or % truecolor) images had to be represented with three different matrices, one % for each color component. I really don't miss those days! % % Here are two examples, an indexed image and a gray-scale image. [X,map] = imread('trees.tif'); imshow(X,map); title('Original indexed image') %% bw = im2bw(X,map,0.5); imshow(bw) title('Output of im2bw') %% I = imread('cameraman.tif'); imshow(I) title('Original gray-scale image') xlabel('Cameraman image courtesy of MIT') %% bw = im2bw(I,0.5); imshow(bw) title('Output of im2bw') %% % It turns out that |im2bw| had other syntaxes that did not appear in the % documentation. Specifically, the |LEVEL| argument could be omitted. Here % is relevant code fragment: % % if isempty(level), % Get level from user % level = 0.5; % Use default for now % end % % Experienced software developers will be amused by the code comment above, % "Use default for now". This indicates that the developer intended to go % back and do something else here before shipping but never did. Anyway, % you can see that a |LEVEL| of 0.5 is used if you don't specify it % yourself. % % MATLAB 5 and Image Processing Toolbox version 2.0 shipped in early 1998. % These were very big releases for both products. MATLAB 5 featured % multidimensional arrays, cell arrays, structs, and many other features. % MATLAB 5 also had something else that was big for image processing: % numeric arrays that weren't double precision. At the time, you could make % uint8, int8, uint16, int16, uint32, int32, and single arrays. However, % there was almost no functional support or operator support these arrays. % The capability was so limited that we didn't even mention it in the % MATLAB 5 documentation. % % Image Processing Toolbox 2.0 provided support for (and documented) uint8 % arrays. The other types went undocumented and largely unsupported in both % MATLAB and the toolbox for a while longer. % % Multidimensional array and uint8 support affected almost every function in % the toolbox, so version 2.0 was a complex release, especially with respect % to compatibility. We wanted to be able to handle uint8 and % multidimensional arrays smoothly, to the degree possible, with existing % user code. % % One of the design questions that arose during this transition concerned the % |LEVEL| argument for |im2bw|. Should the interpretation of |LEVEL| be % different, depending on the data type of the input image? To increase the % chance that existing user code would work as expected without change, even % if the image data type changed from double to uint8, we adopted the % convention that |LEVEL| would continue to be specified in the range [0,1], % independent of the input image data type. That is, a |LEVEL| of 0.5 has % the same visual effect for a double input image as it does for a uint8 % input image. % % Now, image processing as a discipline is infamous for its "magic numbers," % such as threshold values like |LEVEL|, that need to be tweaked for every % data set. Sometime around 1999 or 2000, we reviewed the literature about % algorithms to compute thresholds automatically. There were only a handful % that seemed to work reasonably well for a broad class of images, and one % in particular seemed to be both popular and computationally efficient: N. % Otsu, "A Threshold Selection Method from Gray-Level Histograms," IEEE % Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, 1979, pp. % 62-66. This is the one we chose to implement for the toolbox. It is the % algorithm under the hood of the function % , % which was introduced in version 3.0 of the toolbox in 2001. % % The function |graythresh| was designed to work well with the function % |im2bw|. It takes a gray-scale image and returns the same normalized % |LEVEL| value that |im2bw| uses. For example: level = graythresh(I) %% bw = im2bw(I,level); imshow(bw) title('Level computed by graythresh') %% % Aside from introduced in R2012b, this has been the % state of image binarization in the Image Processing Toolbox for about the % last 15 years. % % There are a few weaknesses in this set of functional designs, though, and % these weaknesses eventually led the development to consider an overhaul. % % * Most people felt that the value returned by |graythresh| would have been % a better default |LEVEL| than 0.5. % * If you don't need to save the value of |LEVEL|, then you end up calling % the functions in a slightly awkward way, passing the input image % to each of the two functions: |bw = im2bw(I,graythresh(I))| % * Although Otsu's method really only needs to know the image histogram, % you have to pass in the image itself to the |graythresh| function. This % is awkward for some use cases, such as using the collective histogram of % multiple images in a dataset to compute a single threshold. % * There was no locally adaptive thresholding method in the toolbox. % % Next time I plan to discuss the new image binarization functional designs % in R2016a. % % Also, thanks very much to ez, PierreC, Matt, and Mark for their comments % on the previous post. ##### SOURCE END ##### 608aca8495da4186a9b9ae09d9990efe -->

Categories: Blogs

The open web’s guardians are acting like it’s already dead

Cory Doctorow - 2016, May 3 - 11:02

The World Wide Web Consortium — an influential standards body devoted to the open web — used to make standards that would let anyone make a browser that could view the whole Web; now they’re making standards that let the giant browser companies and giant entertainment companies decide which browsers will and won’t work on the Web of the future.

When you ask them why they’re doing this, they say that the companies are going to build technology that locks out new entrants no matter what they do, and by capitulating to them, at least there’s a chance of softening the control the giants will inevitably get.

In my latest Guardian column, Why the future of web browsers belongs to the biggest tech firms, I explain how the decision of the W3C to let giant corporations lock up the Web betrays a belief that the open Web is already dead, and all that’s left to argue about are the terms on which our new overlords will present to us.

Today is the International Day Against DRM. EME, the W3C project that hands control over the Web to giant corporations, uses DRM to assert this control.

We will get the open Web we deserve. If you and I and everyone we know stand up to the bullies who want to use entertainment technology to seize control over the future, we can win.

Otherwise, we’ll be Huxleyed into the full Orwell.

Make it easy for today’s crop of web giants to sue any new entrants into oblivion and you can be pretty certain there won’t be any new entrants.

It marks a turning point in the history of those companies. Where once web giants were incubators for the next generation of entrepreneurs who struck out and started competitors that eclipsed their former employers, now those employees are setting the stage for a future where they can stay where they are, or slide sideways to another giant. Forget overturning the current order, though. Maybe they, too, think the web is cooked.

In case there was any doubt of where the W3C stood on whether the future web needed protection from the giants of today, that doubt was dispelled last month. Working with the Electronic Frontier Foundation, I proposed that the W3C adapt its existing policies – which prohibit members from using their patents to block new web companies – to cover EME, a moved that was supported by many W3C members.

Rather than adopt this proposal or a version of it, last month, the W3C executive threw it out, giving the EME group a green light to go forward with no safeguards whatsoever.

Why the future of web browsers belongs to the biggest tech firms
[The Guardian]

Categories: Blogs