Searching for the (SVG) path to enlightenment
January 5, 2012 7:19 AM Subscribe
SVG Paths Mystery: Are there any widely-used tools or standard 'compact' dialects for representing the contents of the @d attributes of a collection of SVG paths?
I'm asking this because I'm working with a database that contains a compilation of records submitted by a number of different entities (county governments / tax assesors, in this case - may be relevant to the question).
Each record contains a field whose contents are a single delimited string that is obviously a compact representation of something that can be converted into the @d attributes of a collection of SVG paths, I'll call them cShapes (for compact shapes).
The first few cShapes I looked at shared the same delimitation scheme (see first example below), and were pretty easy to convert into a collection of actual SVG paths. I wrote a conversion routine for these cShapes, but, with time, I realized that not all of the cShapes share the same delimitation scheme. "Reverse engineering" each new scheme we come across is time-consuming, and some of the schemes are much harder to reverse engineer than others.
I've come across around 6 or 7 schemes, and I'd like to find out whether there are any standard schemes or widely-used tools that generate such cShapes from a collection of SVG paths. If there's something out there that will help me identify the popular schemes and how they are delimited, that'd save me a ton of work. Plus we keep running into edge cases, whereby our conversion routine for a certain scheme of cShapes comes across an as-yet unhandled character string and doesn't know what to do with it.
Some example cShape strings - I've come across several others, but am not including them here, for the sake of brevity:
1. X,U2DD,L2DM,U2DP,L17DM,U51,R62DM,D5DM,R9DM,D21DM,L24DP,D25DM,L16DM,D2DP,L2DM,D2DM,L10DD
|XU4/L19,U35DD,R47DD,D35,L18DP,L10DP,L19DP
|XU4,U11DM,L7DP,U11DP,R7DM,U13DM,R10DM,D13DP,R7DM,D11DM,L7DP,D11DP,L10DD
|XD4,U4DD,R10DD,D4DM,L10DM
|XR43/U50,U5DD,R9DD,D5DD,L9DD
|XR28/U6,U23DD,R24DD,D23DM,L24DP
|XU55/L19,U20DD,R55,D20,L51DD,U3DL,R30DP,U14,L30DD,D14DD,D3DD,L4DD
|XR28/U19,U7DD,R4DP,D7DP,L4DD
|XU58/L15,U14DD,R30DD,D14DD,L30DD
|XR52/U29,U26DP,L16DP,U8,R42/D4,D30,L26DP
2. A0CU33R40D27L20U08L05D14L15
A01CR20D07L20U07
A02U33R02CU10R10D10L10
A03R20CU06R20D06L20
A04R20CD14R20U14L20
A05R15CU14R05D14L05
3. XSU3SR1,U19,R29,D30,L7,U6,L4,U5,L11,U3,L4,D3,L3DD
|XSU4SR1,MD17,U17,U19,R29,D30,L7,U6,L4,D12DD,L18
|XSD1,MD16,MR18,U11,R4DD,D6DD,R7DD,D5,L11,D4DD,U4DP
|MD20,U3DD,U17,R3,U3DD,R4DD,D3DD,R11DD,D17,D3,L18
|XSR4,MU19,MR8,U10,R10,D10DD,L10DD,L8,R8DD
(linebreaks added for formatting purposes - I added them at the path delimiter in each example)
Potential clue, there's a 'sketch_lang' column in each record in the DB containing the value 'swg'. Yes, that's correct 'swg' and not svg or dwg. Could be a misspelling, or could be a useful clue.
Another potential clue: It seems that delimitation schemes often differ by county, with one particular scheme (first example above) accounting for the large majority of drawings I've taken a look at, so far.
I've taken a look at the dxf specification (PDF) - I guess the codes strings could have been generated from .dxf drawings, but again, that's not really helpful unless there are some standard, defined formats for this kind of exported data that I can refer to.
I'm asking this because I'm working with a database that contains a compilation of records submitted by a number of different entities (county governments / tax assesors, in this case - may be relevant to the question).
Each record contains a field whose contents are a single delimited string that is obviously a compact representation of something that can be converted into the @d attributes of a collection of SVG paths, I'll call them cShapes (for compact shapes).
The first few cShapes I looked at shared the same delimitation scheme (see first example below), and were pretty easy to convert into a collection of actual SVG paths. I wrote a conversion routine for these cShapes, but, with time, I realized that not all of the cShapes share the same delimitation scheme. "Reverse engineering" each new scheme we come across is time-consuming, and some of the schemes are much harder to reverse engineer than others.
I've come across around 6 or 7 schemes, and I'd like to find out whether there are any standard schemes or widely-used tools that generate such cShapes from a collection of SVG paths. If there's something out there that will help me identify the popular schemes and how they are delimited, that'd save me a ton of work. Plus we keep running into edge cases, whereby our conversion routine for a certain scheme of cShapes comes across an as-yet unhandled character string and doesn't know what to do with it.
Some example cShape strings - I've come across several others, but am not including them here, for the sake of brevity:
1. X,U2DD,L2DM,U2DP,L17DM,U51,R62DM,D5DM,R9DM,D21DM,L24DP,D25DM,L16DM,D2DP,L2DM,D2DM,L10DD
|XU4/L19,U35DD,R47DD,D35,L18DP,L10DP,L19DP
|XU4,U11DM,L7DP,U11DP,R7DM,U13DM,R10DM,D13DP,R7DM,D11DM,L7DP,D11DP,L10DD
|XD4,U4DD,R10DD,D4DM,L10DM
|XR43/U50,U5DD,R9DD,D5DD,L9DD
|XR28/U6,U23DD,R24DD,D23DM,L24DP
|XU55/L19,U20DD,R55,D20,L51DD,U3DL,R30DP,U14,L30DD,D14DD,D3DD,L4DD
|XR28/U19,U7DD,R4DP,D7DP,L4DD
|XU58/L15,U14DD,R30DD,D14DD,L30DD
|XR52/U29,U26DP,L16DP,U8,R42/D4,D30,L26DP
2. A0CU33R40D27L20U08L05D14L15
A01CR20D07L20U07
A02U33R02CU10R10D10L10
A03R20CU06R20D06L20
A04R20CD14R20U14L20
A05R15CU14R05D14L05
3. XSU3SR1,U19,R29,D30,L7,U6,L4,U5,L11,U3,L4,D3,L3DD
|XSU4SR1,MD17,U17,U19,R29,D30,L7,U6,L4,D12DD,L18
|XSD1,MD16,MR18,U11,R4DD,D6DD,R7DD,D5,L11,D4DD,U4DP
|MD20,U3DD,U17,R3,U3DD,R4DD,D3DD,R11DD,D17,D3,L18
|XSR4,MU19,MR8,U10,R10,D10DD,L10DD,L8,R8DD
(linebreaks added for formatting purposes - I added them at the path delimiter in each example)
Potential clue, there's a 'sketch_lang' column in each record in the DB containing the value 'swg'. Yes, that's correct 'swg' and not svg or dwg. Could be a misspelling, or could be a useful clue.
Another potential clue: It seems that delimitation schemes often differ by county, with one particular scheme (first example above) accounting for the large majority of drawings I've taken a look at, so far.
I've taken a look at the dxf specification (PDF) - I guess the codes strings could have been generated from .dxf drawings, but again, that's not really helpful unless there are some standard, defined formats for this kind of exported data that I can refer to.
Best answer: It's not any format I recognize. It'd be helpful to understand what those data blobs describe. Are they geographic boundaries, like an ESRI shapefile? You've asked this question in terms of SVG but I don't think there's anything SVG-specific about this at all. In any event the SVG d path format is just a simple vector graphics drawing system.
If it's geographic data you may want to try asking your question in a geo-specific place like GIS StackExchange. The folks who work on GDAL are probably the world's experts on obscure geographic format.
posted by Nelson at 1:15 PM on January 5, 2012 [1 favorite]
If it's geographic data you may want to try asking your question in a geo-specific place like GIS StackExchange. The folks who work on GDAL are probably the world's experts on obscure geographic format.
posted by Nelson at 1:15 PM on January 5, 2012 [1 favorite]
Response by poster: Mike1024: Thanks for the resource. I've done some quick research on SmallWorld - promising lead.
Nelson: I read over the ESRI ShapeFile specification. I'm a GIS noob, really, so am not really familiar with the different formats out there.
Each of the 3 examples I gave represents a 2D, top-down outline or plan of structural improvements on a given parcel of land. There is no lat-lon data included, so I know what the structural improvements look like, but I don't know where they're located on the earth, their compass orientation, the shape of the containing parcel, etc. In short, there's a good possibility that the shapes were pulled from some GIS system, but they contain no location data.
You're right, this may have nothing to do with SVG, other than the fact that SVG is a way to define a graphic in vector terms (as are the strings in the db). Honestly, until I posted this question, my mind had been reading the "lang=swg" as "lang=svg". Additionally, the coordinates map perfectly onto SVG paths that are expressed relatively (whereas ShapeFile PolyLines and Polygons seem to be represented absolutely).
In the 3 examples I gave, each newline would be equivalent to an SVG Path and could be converted into an ESRI ShapeFile PolyLine or Polygon.
I'll take a closer look at GDAL and GIS StackExchange - thanks for the resources.
Additional Info for the curious:
An SVG Path can be represented in several ways - it can be represented absolutely (exactly like the ordered vertices in a ShapeFile PolyLine), or it can be represented relatively (as a starting point followed by an ordered collection of line-to commands - e.g. up 40, right 20). All of the paths I've come across so far are closed, meaning they're also similar to ShapeFile Polygons. Units are in feet, and resolution is one foot.
The path vertices are expressed relatively in all of the strings I've come across, so far.
A simple example:
X starts a new Path or PolyLine
'|' separates two Paths/PolyLines
',' separates LineTo commands
'/' separates X,Y coordinates where a LineTo command moves diagonally (so no just along the X or Y axis)
D = Down, R = Right, U = Up, L = Left
In SVG 0,0 is generally the top left corner (which seems to be the case for these strings, as well)
Raw Format:
Note for posterity: I've simplified the above SVG representations for the sake of brevity here.
posted by syzygy at 4:36 AM on January 7, 2012
Nelson: I read over the ESRI ShapeFile specification. I'm a GIS noob, really, so am not really familiar with the different formats out there.
Each of the 3 examples I gave represents a 2D, top-down outline or plan of structural improvements on a given parcel of land. There is no lat-lon data included, so I know what the structural improvements look like, but I don't know where they're located on the earth, their compass orientation, the shape of the containing parcel, etc. In short, there's a good possibility that the shapes were pulled from some GIS system, but they contain no location data.
You're right, this may have nothing to do with SVG, other than the fact that SVG is a way to define a graphic in vector terms (as are the strings in the db). Honestly, until I posted this question, my mind had been reading the "lang=swg" as "lang=svg". Additionally, the coordinates map perfectly onto SVG paths that are expressed relatively (whereas ShapeFile PolyLines and Polygons seem to be represented absolutely).
In the 3 examples I gave, each newline would be equivalent to an SVG Path and could be converted into an ESRI ShapeFile PolyLine or Polygon.
I'll take a closer look at GDAL and GIS StackExchange - thanks for the resources.
Additional Info for the curious:
An SVG Path can be represented in several ways - it can be represented absolutely (exactly like the ordered vertices in a ShapeFile PolyLine), or it can be represented relatively (as a starting point followed by an ordered collection of line-to commands - e.g. up 40, right 20). All of the paths I've come across so far are closed, meaning they're also similar to ShapeFile Polygons. Units are in feet, and resolution is one foot.
The path vertices are expressed relatively in all of the strings I've come across, so far.
A simple example:
X starts a new Path or PolyLine
'|' separates two Paths/PolyLines
',' separates LineTo commands
'/' separates X,Y coordinates where a LineTo command moves diagonally (so no just along the X or Y axis)
D = Down, R = Right, U = Up, L = Left
In SVG 0,0 is generally the top left corner (which seems to be the case for these strings, as well)
Raw Format:
X,D10,R10,U10,L10|XD1/R1,D8,R8,U8,L8Translates to the following two SVG paths, expressed relatively:
<path d="m 0 0 l 10 0 l 0 10 l -10 0 l 0 -10"/> <path d="m 1 1 l 8 0 l 0 8 l -8 0 l 0 -8"/>Also translates to the following SVG polygons, expressed absolutely (closer to the ShapeFile PolyLine and Polygon formats):
<polygon points="0,0 10,0 10,10 0,10 0,0"/> <polygon points="1,1 9,1 9,9 1,9 1,1"/>Converted to a graphic, you'd get two concentric squares. The outer square starts at 0,0 and is 10 units per side. The inner square starts at 1,1 and is 8 units per side. (Graphic at imgbin.org)
Note for posterity: I've simplified the above SVG representations for the sake of brevity here.
posted by syzygy at 4:36 AM on January 7, 2012
Response by poster: Nelson:
Some really good resources linked from GDAL - I'd looked at some specifications from the Open Geospatial Consortium, but hadn't seen this one yet: OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture. The GDAL/OGR library was inspired by the linked spec.
Doesn't answer my question here, but this spec is an excellent resource for me, anyway, since I'm working with a subset of the operations and concepts it describes in the application I'm writing.
Interestingly, most of the vector formats I'm coming across seem to work with absolute positioning, only. SVG is the only one I've seen so far (in my non-exhaustive search) where shapes can be defined via absolute or relative positioning.
posted by syzygy at 5:31 AM on January 7, 2012
Some really good resources linked from GDAL - I'd looked at some specifications from the Open Geospatial Consortium, but hadn't seen this one yet: OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture. The GDAL/OGR library was inspired by the linked spec.
Doesn't answer my question here, but this spec is an excellent resource for me, anyway, since I'm working with a subset of the operations and concepts it describes in the application I'm writing.
Interestingly, most of the vector formats I'm coming across seem to work with absolute positioning, only. SVG is the only one I've seen so far (in my non-exhaustive search) where shapes can be defined via absolute or relative positioning.
posted by syzygy at 5:31 AM on January 7, 2012
The lack of any location data makes me think more of a CAD system than a GIS system. I took a quick look and your data doesn't look like AutoCAD's DXF or DWG, nor SketchUp's SKP. But I'm no CAD expert. Maybe some CAD expert would recognize the format.
One thing that's striking about your examples is they are all specified with small integers. Most shape formats I know of work in floating point and have awkward precision. Ie instead of U33R40 you'd have U3.3153R3.984.
posted by Nelson at 7:47 AM on January 7, 2012
One thing that's striking about your examples is they are all specified with small integers. Most shape formats I know of work in floating point and have awkward precision. Ie instead of U33R40 you'd have U3.3153R3.984.
posted by Nelson at 7:47 AM on January 7, 2012
Response by poster: Thanks for the help.
I've asked around, done some more research, but still haven't found much. I decided to convert the strings into something usable with regular expressions for the time-being.
posted by syzygy at 8:54 AM on January 12, 2012
I've asked around, done some more research, but still haven't found much. I decided to convert the strings into something usable with regular expressions for the time-being.
posted by syzygy at 8:54 AM on January 12, 2012
« Older Where do they keep the jewelry blogs? | Backup android phone without using the screen Newer »
This thread is closed to new comments.
posted by Mike1024 at 12:58 PM on January 5, 2012