Problem Solver
64 stories
·
0 followers

Christopher Allan Webber: Hitchhiker's guide to data formats

1 Comment

Just thinking out loud this morning on what data formats there are and how they work with the world:

  • XML: 2000's hippest technology. Combines a clear, parsable tree based syntax with extension mechanisms and a schema system. Still moderately popular, though not as it once was. Tons of tooling. Many seem to think the tooling makes it overly complex, and JSON has taken over much of its place. Has the advantage of unambiguity over vanilla JSON, if you know how to use it right, but more effort to work with.
  • SGML: XML's soupier grandmother. Influential.
  • HTML: Kind of like SGML and XML but for some specific data. Too bad XHTML never fulfilled its dream. Without XHTML, it's even soupier than SGML, but there's enough tooling for soup-processing that most developers don't worry about it.
  • JSON: Also tree-based, but keeps things minimal, just your basic types. Loved by web developers everywhere. Also ambiguous since on its own, it's schema-free... this may lead to conflicts between applications. But if you know the source and the destination perfectly it's fine. Has the advantage of transforming into basic types in pretty much every language and widespread tooling. (Don't be evil about being evil, though? #vaguejokes) If you want to send JSON between a lot of locations and want to be unambiguous in your meaning, or if you want more than just the basic types provided, you're going to need something more... we'll come to that in a bit.
  • S-expressions: the language of lisp, and lispers claim you can represent anything as s-expressions, which is true, but also that's kind of ambiguous on its own. Capable also of representing code just as well, which is why lispers claim benefits of symmetry and "code that can write code". However, serializing "pure data" is also perfectly possible with s-expressions. So many variations between languages though... it's more of a "generalized family" or even better, a pattern, of data (and code) formats. Some damn cool representations of some of these other formats via sexps. Some people get scared away by all the parens, though, which is too bad, because (though this strays into code + data, not just data) homoiconicity can't be beat. (Maybe Wisp can help there?)
  • Canonical s-expressions: S-expressions, with a canonical representation... cool! Most developers don't know about it, but was designed for public key cryptography usage, and still actively used there (libgcrypt uses canonical s-expressions under the hood, for instance). No schema system, and actually pretty much just lists and binary strings, but the binary strings can be marked with "display hints" so systems can know how to unpack the data into appropriate types.
  • RDF and friends: The "unicode" of graph-oriented data. Not a serialization itself, but a specification on the conceptual modeling of data, and you'll hear "linked data" people talking about it a lot. A graph of "subject, predicate, object" triples. Pretty cool once you learn what it is, though the introductory material is really overwhelming. (Also, good luck representing ordered lists). However, there is no one serialization of RDF, which leads to much confusion among many developers (including myself, while being explained to the contrary, for a long time). For example, rdf/xml looks like XML, but woe be upon ye who uses XML tooling upon it. So, deserialzie to RDF, then deal with RDF in RDF land, then serialize again... that's the way to go with RDF. Has more sane formats than just rdf/xml, for example Turtle is easy to read. RDF community seems to get mad when you want to interpret data as anything other than RDF, which can be very off-putting, though the goal of a "platonic form" of data is highly admirable. That said, graph based tooling is definitely harder for most developers to work with than tree-based tooling, but hopefully "the jQuery of RDF" library will become available some day, and things will be easier. Interesting stuff to learn, anyway!
  • json-ld: A "linked data format", technically can transform itself into RDF, but unlike other forms of RDF syntax, can often be parsed just on its own as simple JSON. So, say you want to have JSON and keep things easy for most of your users who just use their favorite interpreted language to extract key value pairs from your API. Okay, no problem for them! But suddenly you're also consuming JSON from multiple origins, and one of them uses "run" to say "run a mile" whereas your system uses "run" to mean "run a program". How do you tell these apart? With json-ld you can "expand" a JSON representation with supplied context to an unambiguous form, and you can "compact" it down again to the terms you know and understand in your system, leaving out those you don't. No more executing a program for a mile!
  • Microformats and RDFa: Two communities which are notoriously and exasperatingly at odds with each other for over a decade, so why do I link them together? Well, both of these take the same approach of embedding data in HTML. Great when you have HTML for your data to go with, though not all data needs an HTML wrapper. But it's good to be able to extract it! RDFa simply extracts to RDF, which we've discussed plenty; Microformats extracts to its own thing. Frequent form of contention between these groups is about vocabulary, and how to represent vocabulary. RDFa people like their vocabulary to have canonical URIs for each term (well, that's an RDF thing, so not surprising), Microformats people like to document everything in a wiki. Arguments about extensibility is a frequent topic... if you want to get into that, see Amy Guy's summary of things.

Of course, there's more data formats than that. Heck, even on top of these data formats there's a lot more out there (these days I spend a lot of time working on ActivityStreams 2.0 related tooling, which is just JSON with a specific structure, until you want to get fancier, add extensions, or jump into linked data land, in which case you can process it as json-ld). And maybe you'd also find stuff like Cap'n Proto or Protocol Buffers to be interesting. But the above are the formats that, today, I think are generally most interesting or impactful upon my day to day work. I hope this guide was interesting to you!

Read the whole story
jhart
3096 days ago
reply
Our lives are lived through the data that we create.
Chicago
Share this story
Delete

An animated guide to all the different ways of making coffee

1 Comment


An animated guide to all the different ways of making coffee

Read the whole story
jhart
3124 days ago
reply
Happy Coffee Day.
Chicago
Share this story
Delete

Security wares like Kaspersky AV can make you more vulnerable to attacks

1 Share

Enlarge / A screenshot showing proof-of-concept exploit code working against Kaspersky antivirus software. (credit: Tavis Ormandy)

Antivirus applications and other security software are supposed to make users more secure, but a growing body of research shows that in some cases, they can open people to hacks they otherwise wouldn't be vulnerable to.

The latest example is antivirus and security software from Kaspersky Lab. Tavis Ormandy, a member of Google's Project Zero vulnerability research team, recently analyzed the widely used programs and quickly found a raft of easy-to-exploit bugs that made it possible to remotely execute malicious code on the underlying computers. Kaspersky has already fixed many of the bugs and is in the process of repairing the remaining ones. In a blog post published Tuesday, he said it's likely he's not the only one to know of such game-over vulnerabilities.

"We have strong evidence that an active black market trade in antivirus exploits exists," he wrote, referring to recent revelations that hacked exploit seller Hacking Team sold weaponized attacks targeting antivirus software from Eset.

He continued: "Research shows that it’s an easily accessible attack surface that dramatically increases exposure to targeted attacks. For this reason, the vendors of security products have a responsibility to uphold the highest secure development standards possible to minimise the potential for harm caused by their software. Ignoring the question of efficacy, attempting to reduce one’s exposure to opportunistic malware should not result in an increased exposure to targeted attacks."

As Ormandy suggested, the bugs he found in Kaspersky products would most likely be exploited in highly targeted attacks, such as those the National Security Agency might carry out against a terrorism suspect or spies pursuing an espionage campaign might carry out against the CEO of a large corporation. That means most people are probably better off running antivirus software than foregoing it, at least if their computers run Windows. Still, the results are concerning because they show that the very software we rely on to keep us safe in many cases makes us more vulnerable.

Kaspersky isn't the only security software provider to introduce bugs in their products. Earlier this month, security researcher Kristian Erik Hermansen reported finding four vulnerabilities in the core product marketed by security firm FireEye. One of them made it possible for attackers to retrieve sensitive password data stored on the server running the program. Ormandy has also uncovered serious vulnerabilities in AV software from Sophos and Eset.

In a statement, Kaspersky Lab officials wrote, "We would like to assure all our clients and customers that vulnerabilities publicly disclosed in a blogpost by Google Project Zero researcher, Mr. Tavis Ormandy, have already been fixed in all affected Kaspersky Lab products and solutions. Our specialists have no evidence that these vulnerabilities have been exploited in the wild."

The statement went on to say that Kaspersky Lab developers are making architectural changes to their products that will let them better resist exploit attempts. One change included the implementation of stack buffer overflow protection, which Ormandy referred to as "/GS" in his blog post. Other planned changes include the expansion of mitigations such as address space layout randomization and data execution prevention (for much more on these security measures see How security flaws work: The buffer overflow by Ars Technology Editor Peter Bright). Ormandy thanked Kaspersky Lab for its "record breaking response times" following his report.

Still, the message is clear. To perform, security software must acquire highly privileged access to the computers they protect, and all too often this sensitive position can be abused. Ormandy recommended that AV developers build security sandboxes into their products that isolate downloaded files from core parts of the computer operating system.

"The chromium sandbox is open source and used in multiple major products," he wrote. "Don't wait for the network worm that targets your product, or for targeted attacks against your users, add sandboxing to your development roadmap today."

Read Comments

Read the whole story
jhart
3127 days ago
reply
Chicago
Share this story
Delete

Ameritrade's thinkorswim Challenge: Teaching America's Youth How To Invest

1 Comment
Ameritrade's thinkorswim Challenge: Teaching America's Youth How To Invest
Benzinga - 14 minutes ago
Nicole Sherrod, managing director of trading for TD Ameritrade, said the company's college level virtual trading competition called thinkorswim Challenge could help alleviate Gen Z's investing problems.
Read the whole story
jhart
3138 days ago
reply
Amazing competition
Chicago
Share this story
Delete

App Submissions on Google Play Now Reviewed by Staff

1 Comment
Comments
Read the whole story
jhart
3320 days ago
reply
To all Android developers out there, make sure you update your ratings.
Chicago
Share this story
Delete

Don’t call them “utility” rules: The FCC’s net neutrality regime, explained

1 Share

Within a few weeks we’ll have a huge document full of legalese on the Federal Communications Commission’s net neutrality rules, to replace the near-200-page order from 2010 that was mostly overturned by a court ruling last year.

But there are enough details in the 4-page summary of FCC Chairman Tom Wheeler’s proposal released today for us to tell you in general terms what it does and doesn’t do. FCC officials also provided further background in a phone call with reporters today. One thing they were clear on: this isn’t “utility-style regulation,” because there will be no rate regulation, Internet service providers (ISPs) won’t have to file tariffs, and there’s no unbundling requirement that would force ISPs to lease network access to competitors.

But the order does reclassify ISPs as common carriers, regulating them under Title II of the Communications Act, the same statute that governs telephone companies. ISPs will not be allowed to block or throttle Internet content, nor will they be allowed to prioritize content in exchange for payments. The rules will apply to home Internet service such as cable, DSL, and fiber, and to mobile broadband networks generally accessed with smartphones.

Internet providers will be common carriers in their relationships with home Internet and mobile broadband customers; they will also be common carriers in their relationships with companies that deliver content to subscribers over the networks operated by ISPs. That includes online content providers such as Amazon or Netflix.

The rules apply only to retail Internet providers, those that offer consumers the ability to access the Internet. They do not regulate Web applications or other network operators. Content delivery networks like Akamai, which improve performance by optimizing delivery of content across the Internet, would not be affected by the paid prioritization ban.

Here’s a more detailed look at what kinds of rules both fixed and mobile broadband providers will and won’t have to follow (assuming the commission approves Wheeler’s plan on Feb. 26).

Three “bright line rules”

The ban on blocking, throttling, and paid prioritization is the biggest takeaway.  “Broadband providers may not block access to legal content, applications, services, or non-harmful devices… may not impair or degrade lawful Internet traffic on the basis of content, applications, services, or non-harmful devices... [and] may not favor some lawful Internet traffic over other lawful traffic in exchange for consideration—in other words, no ‘fast lanes.’ This rule also bans ISPs from prioritizing content and services of their affiliates,” the FCC said. The core provisions of Title II banning “unjust and unreasonable practices” will be used to enforce these rules.

Data caps

There’s no ban on data caps, but the proposal would let the FCC intervene when caps are used to harm consumers or competitors. Cellular providers have been experimenting with “zero-rating,” letting consumers access certain services without using up their data allotments. AT&T is charging companies for the right to deliver data without counting against customers’ caps; T-Mobile exempts certain music services from caps, but without charging anyone.

FCC officials on the call with reporters seemed less concerned about data exemptions that occur without payment than those that require payment, but did not commit to banning any particular type of practice. The matter will be handled on a case-by-case basis to determine whether a zero-rating program hinders competition for "over-the-top" services, those provided over an Internet connection. In the net neutrality order, the FCC is not taking any stance on specific programs provided today.

Transparency

Though the 2010 order’s anti-blocking and anti-discrimination rules were thrown out because of a lawsuit filed by Verizon, the court did not object to requirements that ISPs tell the public about their network management practices. ISPs will face greater disclosure requirements in the new proposal, but the fact sheet didn’t say exactly how the rules will be different.

Possible loophole? “Reasonable network management”

Net neutrality advocates have worried that exceptions to anti-discrimination rules would render them meaningless. Wheeler is allowing for “reasonable network management,” which “recognizes the need of broadband providers to manage the technical and engineering aspects of their networks.”

But ISPs cannot claim “reasonable network management” in order to meet a business need. “For example, a provider can’t cite reasonable network management to justify reneging on its promise to supply a customer with ‘unlimited’ data,'" the FCC said.

Some data services that don’t go over the public Internet will be largely exempt from Title II oversight. VoIP phone service offered by a cable provider is one example; another is a heart-monitoring service that doesn’t use the public Internet. These exceptions don’t change the transparency requirements, which “will continue to cover any offering of such non-Internet data services—ensuring that the public and the Commission can keep a close eye on any tactics that could undermine the Open Internet rules.”

A standard to cover unforeseen misbehavior

Just in case ISPs come up with some new way of creating a non-neutral Internet, there will be a “standard for future conduct” that would help the FCC determine whether new practices should be allowed. While not fully defined in the fact sheet, “the proposal would create a general Open Internet conduct standard that ISPs cannot harm consumers or edge providers.”

Netflix’s favorite part: interconnection disputes

Netflix and some other companies have complained about the prices ISPs charge for direct network connections. These connections ensure a smooth path into the network but don’t provide any priority thereafter. The net neutrality proposal doesn’t ban these agreements, but gives the FCC “authority to hear complaints and take appropriate enforcement action if necessary, if it determines the interconnection activities of ISPs are not just and reasonable, thus allowing it to address issues that may arise in the exchange of traffic between mass-market broadband providers and edge providers.” Besides companies like Netflix, content delivery networks such as Akamai or transit providers such as Cogent could bring complaints to the FCC.

New taxes and fees? Nope

Some Title II opponents tried to convince the FCC that Title II would bring $15 billion in new user fees per year, causing millions of households to stop subscribing to Internet service.

That’s simply not true, the FCC said. “The Order will not impose, suggest or authorize any new taxes or fees,” the commission said. The moratorium on Internet taxation will continue, as required by Congress. Today’s order does not require broadband providers to contribute to the Universal Service Fund (USF), which subsidizes telecommunications projects in underserved areas.

FCC officials noted that they have already begun a separate proceeding on USF funding that could ultimately put a USF charge on customers’ bills, similar to the USF charges on telephone bills. That will proceed independently of the Title II decision. But even so, the FCC could keep the entire Universal Service Fund the same size, so that surcharges on broadband would be offset by reductions in surcharges on phone lines.

While USF fees won’t be applied because of this order, the FCC said it will boost “universal service fund support for broadband service in the future through partial application” of the USF portion of Title II.

Google gets what it wanted: Pole access

Google asked the FCC to enforce Title II rules guaranteeing access to poles, rights-of-way, and other infrastructure controlled by utilities, making it easier for Google Fiber to enter new markets. The FCC said it would enforce the part of Title II that “ensures fair access to poles and conduits” to help new broadband providers.

It’s not clear how much this will really help. Google had a dispute with AT&T over pole access in Austin, Texas, but the companies settled and Google doesn’t seem to have been shut out of any market because of pole attachment problems. And broadband providers could actually end up paying higher pole attachment rates than they did before because “how you’re classified affects what you have to pay,” a cable industry lawyer explained to Ars.

Forbearance

As noted earlier, the FCC plan is to avoid imposing the strictest portions of Title II in a legal process known as “forbearance.” ISPs have complained that forbearance is too onerous a process but the FCC made it sound pretty simple: the commission simply won’t apply things like rate regulation, unbundling, or new taxes and fees. There will also be “no burdensome administrative filing requirements or accounting standards,” the FCC said.

Other Title II provisions that will apply to ISPs

There are dozens of sections in Title II, and some we haven’t yet mentioned will apply. These include investigations of consumer complaints, privacy protections, and protections for people with disabilities. The FCC will have to pick up some of the consumer protection functions performed by the Federal Trade Commission, which is prohibited from taking actions against telecommunications common carriers.

It isn’t the end of the world—really

ISPs have argued that Title II will bring certain doom to the broadband industry, but the FCC pointed to past experience to argue that it won’t. Though Title II hasn’t applied to wireless data before, it does apply to wireless voice, and that industry is thriving.

“For 21 years the wireless industry has been governed by Title II-based rules that forbear from traditional phone company regulation,” the FCC said. “The wireless industry has invested over $400 billion under similar rules, proving that modernized Title II regulation can support investment and competition.”

Broadband will actually face fewer Title II provisions than cellular voice. “When Title II was first applied to mobile, voice was the predominant mobile service. During the period between 1993 and 2009, carriers invested heavily, including more than $270 billion in building out their wireless networks, an increase of nearly 2,000 percent,” the FCC said.

Read Comments

Read the whole story
jhart
3360 days ago
reply
Chicago
Share this story
Delete
Next Page of Stories