JSFileApi: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
 
(46 intermediate revisions by 3 users not shown)
Line 1: Line 1:
This page details a proposal for [https://bugzilla.mozilla.org/show_bug.cgi?id=563742|Bug 563742 - Efficient ctypes API for file handling].
This page details a proposal for [https://bugzilla.mozilla.org/show_bug.cgi?id=563742 Bug 563742 - Efficient ctypes API for file handling].
 
'''Note''' This is an early draft. The final API is documented [https://developer.mozilla.org/en/JavaScript_OS.File on MDN].


The general idea of this API is to provide a low-level, cross-platform, fast access to file management functions. For this reason, it does not implement some primitives that are very different between platforms, e.g. chmod, mmap, epoll.
The general idea of this API is to provide a low-level, cross-platform, fast access to file management functions. For this reason, it does not implement some primitives that are very different between platforms, e.g. chmod, mmap, epoll.


By the way, this document uses the conventions of [http://code.google.com/closure/compiler/docs/js-for-compiler.html|the Google Closure Compiler] for  type annotations.
'''Conventions''' This document uses the conventions of [http://code.google.com/closure/compiler/docs/js-for-compiler.html the Google Closure Compiler] for  type annotations.
 
= Problems addressed by this API  =
 
The current API for file management is based on <tt>nsIFile</tt> and its various <tt>nsLocalFile*</tt> implementations. Several performance-related concerns have surfaced wrt the <tt>nsIFile</tt> API:
 
*the internal implementation is based on <tt>string</tt> paths, rather than file descriptors, which in turn causes repeated filename lookups during the execution of any non-trivial sequence of operations;
*it causes numerous expensive calls to <tt>stat</tt>, both when traversing a directory and when a user is interested in several properties of a file;&nbsp;
*input/output streams do not play too nicely with JavaScript.
 
'''Note''' Do we know how expensive the XPCom/XPConnect overhead is, by opposition to js-ctypes and JSAPI?
 
The present proposal attempts to address these points as follows:
 
*it attempts to minimize filename lookups by using file/directory descriptors wherever possible;
*it attempts to minimize calls to <tt>stat</tt>, in particular when traversing a directory;
*input/output attempts to play much more nicely with JavaScript, by using JavaScript low-level data structures designed for this task (and which didn't exist when the <tt>nsIFile</tt> API was designed).
 
= Problems not addressed by this API =
 
This API is synchronous. This API does not attempt to solve any asynchronicity issue. Consequently, this API is not meant to be used in the main thread. See [https://bugzilla.mozilla.org/show_bug.cgi?id=691309|bug 691309] for one discussion (among many) on how to remove IO from the main thread.
 
Note that it is quite possible to build an asynchronous file API based on this API. This is quite simple, and the process of building an async API on top of a sync API may be the best way to obtain within a finite time a cross-platform asynchronous API that works.
 


= Module FileUtilities =
= Module <tt>Files</tt> =
This module is the first access point to the file API. It contains constructors, functions to copy, move files, etc. as well as the constants used in the API.
This module is the first access point to the file API. It contains constructors, functions to copy, move files, etc. as well as the constants used in the API.




== Access a file or a directory ==
== Access a file or a directory ==
   
   
     /**
     /**
       * Open a file
       * Create a temporary file in the system-defined temporary directory (if it exists). This file is deleted when the process closes or when the file is closed, whichever happens first.
      *
      * @param {string} fileName A platform-dependent name. To create a file in a directory, use [DirectoryDescriptor.openFile].
      * @param {number} accessMode A or-ing of flags, as specified by [FileUtilities.Open.Access].
      * @param {number} contentMode A or-ing of flags, as specified by [FileUtilities.Open.Content]
      * @param {number} pragmaMode A or-ing of flags, as specified by [FileUtilities.Open.Pragma]
      * @return {FileDescriptor} a FileDescriptor
       *
       *
       * @throws FileDescriptorError
       * @param {string=} name An optional name that can be used for debugging purposes as a template for the name of the file created.
       */
       */
     openFile: function(fullName, accessMode, contentMode, pragmaMode) {
     createTempFile: function(name) {
  //Unix:   maps to [open]
  //Unix:       uses cached [nsIDirectoryService] to get temporary directoy, [mkstemp] and [this.openFile]
  //Windows: maps to [CreateFile]
  //Windows:     uses cached [nsIDirectoryService] to get temporary directory, [GetTempFileName] + [CreateFile] http://msdn.microsoft.com/en-us/library/windows/desktop/aa363875%28v=vs.85%29.aspx
//Windows note: we probably want flags FLAG_SHARE_READ, FLAG_SHARE_WRITE, FLAG_SHARE_DELETE to allow moving files that are currently open
     },
     },
   
   
    /**
== General utilities ==
      * Create a temporary file. This file is deleted when the process closes, when the file is closed or when the FileDescriptor is garbage-collected.
      *
      * @return {FileDescriptor} a FileDescriptor
      */
    createTempFile: function() {
//Unix:    [FileUtilities.tmpdir] followed by [mkstemp] and [FileUtilities.openFile]
//Windows: [FileUtilities.tmpdir] followed by [GetTempFileName] and [FileUtilities.openFile]
//Alternative version:
//Unix minus Android: maps to [tmpfile]
//Android:            need to implement custom [tmpfile]
//Windows:            maps to [GetTempPath] + [GetTempFileName] + [CreateFile] http://msdn.microsoft.com/en-us/library/windows/desktop/aa363875%28v=vs.85%29.aspx
    },
    /**
      * @param {string} fullName The platform-specific name of the directory.
      * @param {number=} accessMode A or-ing of flags, as specified by [FileDescriptor.OpenDir.Access]
      *
      * @returns {DirectoryDescriptor} a descriptor which may be used to access this directory
      */
    openDirectory: function(fullName, accessMode) {
//Unix:    lazy
//Windows: lazy
    },
    /**
      * Create a temporary directory.
      *
      * Note: For the time being, there is no guarantee that the temporary directory will be cleaned
      *
      * @returns {DirectoryDescriptor} a descriptor which may be used to access this directory
      */   
    createTempDirectory: function() {
//Unix:    [FileUtilities.tmpdir] followed by [mkstemp] and [FileUtilities.openDirectory]
//Windows: [FileUtilities.tmpdir] followed by [GetTempFileName] and [FileUtilities.openDirectory]
    },
   
   
     /**
     /**
       * Open a copy of a file.
       * Copy a file.
       *
       *
       * If neither [directory] nor [name] is provided, the destination file is first created as with [createTemp]
       * Note: OS-accelerated on some platforms.
      * Note: Attempts to preserve permissions, attributes, etc. However,
      * some file systems do not support this.
       *
       *
      * @param {DirectoryDescriptor=} destination Optionally, the directory in which to place the file.
      * @param {string=} name Optionally, the name of the file in the directory.
      * @return {FileDescriptor} The copy.
      */
    openFileCopy: function(directory, name) {
//Unix:    need to implement -- note that some file systems support an [ioctl] for copy-on-write.
//Windows:  maps to [CopyFile] http://msdn.microsoft.com/en-us/library/aa363851(v=VS.85).aspx
    },
== General utilities ==
    /**
       * @param {string} source The name of the file/directory to copy.
       * @param {string} source The name of the file/directory to copy.
       * @param {string} target The name of the file/directory to be created.
       * @param {string} target The name of the file/directory to be created.
       * @throws FileDescriptorError
      * @param {boolean} overwrite If [false] and if the target already exists, fail.
       * @throws RawFileError
       */
       */
     copy: function(source, target)
     copy: function(source, target, overwrite)
     {
     {
  //Unix:    need to implement with [open], [read], [write], [close]
  //Unix:    need to implement with [open], [read], [write], [close]
        //          TODO: check for existing library routines in Glib / Qt
  //Windows:  maps to [CopyFile] http://msdn.microsoft.com/en-us/library/aa363851(v=VS.85).aspx
  //Windows:  maps to [CopyFile] http://msdn.microsoft.com/en-us/library/aa363851(v=VS.85).aspx
             //Check if it works with directories
             //Check if it works with directories
Line 96: Line 73:
   
   
     /**
     /**
      * Move a file or a directory.
      *
      * Note: OS-accelerated whenever possible.
      * Note: Attempts to preserve permissions, attributes, etc. However,
      * some file systems do not support this.
      *
       * @param {string} source The name of the file/directory to move.
       * @param {string} source The name of the file/directory to move.
       * @param {string} target The name of the file/directory to be created.
       * @param {string} target The name of the file/directory to be created.
       * @throws FileDescriptorError
      * @param {boolean} overwrite If [false] and if the target already exists, fail.
       * @throws RawFileError
       */
       */
     move: function(source, target)
     move: function(source, target, overwrite)
     {
     {
  //Unix:    maps to [rename]
  //Unix:    maps to [rename] or, when [rename] returns EXDEV, on [Files.copy]+[Files.remove]
  //Windows:  maps to [MoveFile] http://msdn.microsoft.com/en-us/library/aa365239(v=VS.85).aspx
  //Windows:  maps to [MoveFile] http://msdn.microsoft.com/en-us/library/aa365239(v=VS.85).aspx
            //Check if it works with directories
     },
     },
    
    
     /**
     /**
       * Remove a file/directory.
       * Remove a file.
      *
      * Note: OS-accelerated under some platforms.
       */
       */
     remove: function(name)
     remove: function(name)
Line 117: Line 104:
== Constants ==
== Constants ==
   
   
    get stdin:  function() {
 
//...
    },
    get stdout: function() {
//...
    },
    get stderr: function() {
//...
    },
     /**
     /**
       * Return the location of the directory/folder used to store temporary files.
       * Return the location of the directory/folder used to store user profile
       *
       *
       * Computed lazily.
       * Computed lazily, cached.
       *
       *
       * @return {DirectoryDescriptor}
       * @return {RawDir}
       */
       */
     get tmpdir: function () {
     get profileDir() {
  //All platforms: use [nsIDirectoryService] [NS_OS_TEMP_DIR] to get the directory the first time
  //All platforms: use [nsIDirectoryService] to get the directory the first time, cache it.
//Alternative solution:
//Unix minus Android: maps to [getenv] for "TMPDIR"
//Android:            TODO - probably somewhere in the Moz preferences directory - check with nsIDirectoryService
//Windows:            maps to [GetTempPath] http://msdn.microsoft.com/en-us/library/windows/desktop/aa364992%28v=vs.85%29.aspx
  //Note: Perhaps we should check with nsIDirectoryServi
     },
     },
   
 
    //TODO: Add other well-known directories.
 
== Flags ==
== Flags ==
   
   
Line 168: Line 143:
             /** Create file if it doesn't exist*/
             /** Create file if it doesn't exist*/
      MAY_CREATE:...,
      MAY_CREATE:...,
            /** Create file; fail if the file already exists */
            MUST_CREATE:...,
             /** Write at the start of file if it exists. If not specified, append.*/
             /** Write at the start of file if it exists. If not specified, append.*/
      OVERWRITE: ...,
      OVERWRITE: ...,
Line 176: Line 153:
  */
  */
  Pragma: {
  Pragma: {
            /** Windows-specific pragma: use Posix-style file names, i.e. two file names who differ only in case should not be collapsed*/
    POSIX_SEMANTICS:    ...,
   
   
             /** Windows-specific pragma: optimize cache for sequential access*/
             /** Windows-specific pragma: optimize cache for sequential access*/
Line 217: Line 192:


== Interfaces ==
== Interfaces ==
=== File Information ===
   
   
     /**
     /**
       * The kind of information that can be found by calling [FileDescroptor.info] or [DirectoryDescriptor.forEachFile].
       * The kind of information that can be found by calling [RawFile.info] or [RawDir.contents].
       *
       *
       * Note that some or all fields may be computed lazily.
       * Note that some or all fields may be computed lazily.
      * This makes it possible to fill in fields whose value
      * has already been computed (typically, by directory
      * traversal), without performing a system
      * call to get the value of the other fields.
       *
       *
       * @interface
       * @interface
       */
       */
     FileInfo: {
     FileInfo: {  
/**
* @return {FileDescriptor} the file descriptor for this file
*/
get descriptor(): {
    ...
},
  /**
  /**
  * @return {number} milliseconds
  * @return {number} milliseconds
Line 242: Line 215:
  * @return {number} bytes
  * @return {number} bytes
  */
  */
  get fileSize() : {
  get size() : {
      ...
      ...
  },
  },
   
   
/**
        /**
* @return {boolean}
        * Note: this property is OS-accelerated for entries returned by [forEachFile] or by enumerating files in a directory.
*/
        *
get isExecutable(): {
        * @return {boolean}
    //Windows: [GetBinaryType] http://msdn.microsoft.com/en-us/library/windows/desktop/aa364819(v=VS.85).aspx
        */
    //Unix:    contents of [stat]
        get isDirectory(): {
},
           
        }
     },
     },
=== Directory entries ===
    DirEntry: {
        /**
        * The name of the file.
        *
        * Note that there is no guarantee that the file still exists by the time you attempt to open it.
        *
        * @return {string=}
        */
        get name(): { ...
        },
   
   
        /**
        * All the information that could be gathered about the file without opening it.
        *
        * @return {FileInfo}
        */
        get info(): { ...
            //Precomputed by [RawDir.contents]
        }
    },
=== Errors ===
     /**
     /**
       * An exception launched by this module.
       * An exception launched by this module.
      *
      * TODO: Specify.
       *
       *
       * @constructor
       * @constructor
Line 263: Line 262:
     Error: function(){}
     Error: function(){}


= Instances of FileDescriptor =  
= Instances of <tt>RawFile</tt> =  


A {FileDescriptor} is a low-level object wrapping a native file descriptor (under variants of Unix) or a file handle (under Windows).
A <tt>RawFile</tt> is a low-level object wrapping a native file descriptor (under variants of Unix) or a file handle (under Windows).


== Reading ==
== Reading ==
Line 274: Line 273:
       * @param {ArrayBuffer} buf The buffer which will receive the data.
       * @param {ArrayBuffer} buf The buffer which will receive the data.
       * @param {number} offset The position in the array at which to start putting data, in bytes.
       * @param {number} offset The position in the array at which to start putting data, in bytes.
       * @param {number} size The maximal number of bytes to read. This method can read less bytes if the file is shorter.
       * @param {number} size The maximal number of bytes to read. This method can read less bytes if (and only if) the file is shorter.
       * @return {number} The number of bytes read.
       * @return {number} The number of bytes read.
       * @throws {FileDescriptorException} In case of error.
       * @throws {RawFileException} In case of file error.
      * @throws {INDEX_SIZE_ERR} In case of array error.
       */
       */
     read: function(buf, offset, size) {
     read: function(buf, offset, size) {
Line 284: Line 284:
   
   
     /**
     /**
       * As [read], but returns a string instead of filling a buffer.
       * As [read], but read from a given position and do not advance.
       *
       *
       * @return {string} The content read from the file. Note that its length may be shorter than [size]
      * @param {number} fileOffset The position in the file from which to read.
       * @throws {FileDescriptorException} In case of error.
      * @param {ArrayBuffer} buf The buffer which will receive the data.
      * @param {number} offset The position in the array at which to start putting data, in bytes.
      * @param {number} size The maximal number of bytes to read. This method can read less bytes if the file is shorter.
       * @return {number} The number of bytes read.
      * @throws {RawFileException} In case of file error.
       * @throws {INDEX_SIZE_ERR} In case of array error.
       */
       */
     readString: function(size) {
     pread: function(fileOffset, buf, offset, size) {
//as [read]
    },
    /**
      * As [read], but do not advance
      */
    pread: function(...) {
  //Unix:    [pread]
  //Unix:    [pread]
  //Windows:  [ReadFile] + [SetFilePointer] http://msdn.microsoft.com/en-us/library/windows/desktop/aa365541(v=VS.85).aspx
  //Windows:  [ReadFile] + [SetFilePointer] http://msdn.microsoft.com/en-us/library/windows/desktop/aa365541(v=VS.85).aspx
     },
     },
 
    /**
      * As [pread], but returns a string instead of filling a buffer.
      *
      * @return {string} The content read from the file. Note that its length may be shorter than [size]
      * @throws {FileDescriptorException} In case of error.
      */
    preadString: function(size) {
//as [pread]
    },
== Writing ==
== Writing ==
   
   
Line 317: Line 305:
       *
       *
       * @param {ArrayBuffer} buf The buffer containing the data.
       * @param {ArrayBuffer} buf The buffer containing the data.
       * @param {number} offset The position in the array at which the data starts, in bytes.
       * @param {number=} offset The position in the array at which the data starts, in bytes. If unspecified, 0.
       * @param {number} size The maximal number of bytes to read. This method can write less bytes, depending on buffering.
       * @param {number=} size The maximal number of bytes to read. This method can write less bytes, depending on buffering. If unspecified, everything from [offset].
       *
       *
       * @return {number} The number of bytes written.
       * @return {number} The number of bytes written.
       * @throws {FileDescriptorException} In case of error.
       * @throws {RawFileException} In case of file error.
      * @throws {INDEX_SIZE_ERR} In case of array error.
       */
       */
     write: function(buf, offset, size) {
     write: function(buf, offset, size) {
Line 327: Line 316:
  //Windows: [WriteFile] http://msdn.microsoft.com/en-us/library/windows/desktop/aa365747%28v=VS.85%29.aspx
  //Windows: [WriteFile] http://msdn.microsoft.com/en-us/library/windows/desktop/aa365747%28v=VS.85%29.aspx
     },
     },
 
    /**
      * As [write], but with a [string]
      */
    writeString: function(buf, offset, size) {
//as [write]
    },
     /**
     /**
       * As [write] but do not advance
       * As [write], but write to a specific position and do not advance
       */
       */
     pwrite: function(buf, offset, size) {
     pwrite: function(fileOffset, buf, offset, size) {
  //Unix: [pwrite]
  //Unix: [pwrite]
  //Windows: [WriteFile] + [SetFilePointer]
  //Windows: [WriteFile] + [SetFilePointer]
     },
     },
    pwriteString: function(buf, offset, size) {
 
//as [pwrite]
    },
== Attributes ==
== Attributes ==
   
   
Line 351: Line 330:
       * Gather information about the file
       * Gather information about the file
       *
       *
       * @return {FileUtilities.FileInfo} information about the file.
       * @return {Files.FileInfo} information about the file.
       */
       */
     stat: function() {
     stat: function() {
Line 374: Line 353:
       *
       *
       * @param {number} delta Number of bytes. Can be positive or negative.
       * @param {number} delta Number of bytes. Can be positive or negative.
       * @param {FileDescriptor.Seek.Methodmethod} Determine whether [delta] is to be taken from the start of the file, from the end or from the current position.
       * @param {RawFile.Seek.Methodmethod} Determine whether [delta] is to be taken from the start of the file, from the end or from the current position.
       */
       */
     seek: function(delta, method) {
     seek: function(delta, method) {
Line 398: Line 377:
  //Windows: [FlushFileBuffers] http://msdn.microsoft.com/en-us/library/windows/desktop/aa364439(v=VS.85).aspx
  //Windows: [FlushFileBuffers] http://msdn.microsoft.com/en-us/library/windows/desktop/aa364439(v=VS.85).aspx
     },
     },
== Not implemented ==
* chmod, chown -- very different between platforms - might implement platform-specific functions
* select, poll, ... -- very different between platforms, higher level
* mmap -- probably feasible, just might require additional API
* locking -- very different between platforms, most likely deserves its own API
* linking -- very different between platforms


= Instances of DirectoryDescriptor =
= Instances of <tt>RawDir</tt> =


A {DirectoryDescriptor} is a slightly higher-level object wrapping a directory _name_ (for reasons of portability & iteration, this seemed more appropriate than _opening_ the directory during construction). On the Unix side, some of the methods rely upon (or have to reimplement) systems that obey recent versions of Posix, with functions such as {openat}.
A <tt>RawDir</tt> is an object wrapping either a directory ''name'' or a directory descriptor, depending on the platform. On the Unix side, some of the methods rely upon (or have to reimplement) systems that obey recent versions of Posix, with functions such as <tt>openat</tt>.


== Opening/creating ==
== Opening/creating ==
Line 417: Line 388:
       *
       *
       * @param {string} leafName The name of the file.
       * @param {string} leafName The name of the file.
       * @param {number} accessMode A or-ing of flags, as specified by [FileDescriptor.Open.Access].
       * @param {number=} accessMode A or-ing of flags, as specified by [RawFile.Open.Access]. If this argument is not provided, we assume 0, i.e. no flags.
       * @param {number} contentMode A or-ing of flags, as specified by [FileDescriptor.Content.Access]
       * @param {number=} contentMode A or-ing of flags, as specified by [RawFile.Content.Access]. If this argument is not provided, we assume 0, i.e. no flags.
       * @param {number} pragmaMode A or-ing of flags, as specified by [FileDescriptor.Pragma.Access]
       * @param {number=} pragmaMode A or-ing of flags, as specified by [RawFile.Pragma.Access]. If this argument is not provided, we assume 0, i.e. no flags.
       * @return {FileDescriptor} a FileDescriptor
       * @return {RawFile} a RawFile
       *
       *
       * @throws FileDescriptorError
       * @throws RawFileError
       */
       */
     openFile: function(leafName, accessMode, contentMode, pragmaMode) {
     openFile: function(leafName, accessMode, contentMode, pragmaMode) {
  //Unix:    maps to [openat] (warning, this requires gnulib on non-Linux platforms)
  //Linux:    [openat]
  //Windows:  cf. [FileDescriptor.open]
        //Unix:      decide between gnulib [openat] and simply [open]
  //Windows:  cf. [RawFile.open]
     },
     },
   
   
     /**
     /**
       * Create a temporary file in this directory. This file is deleted when the process closes, when the file is closed.
       * Create a temporary file in this directory. This file is deleted when the process closes or when the file is closed, whichever happens first.
       */
       */
     createTempFile: function() {
     createTempFile: function() {
Line 441: Line 413:
       *
       *
       * @param {string} leafName The platform-specific name of the directory.
       * @param {string} leafName The platform-specific name of the directory.
       * @param {number=} accessMode A or-ing of flags, as specified by [FileDescriptor.OpenDir.Access]
       * @param {number=} accessMode A or-ing of flags, as specified by [RawFile.OpenDir.Access]
       *
       *
       * @returns {DirectoryDescriptor} a descriptor which may be used to access this directory
       * @returns {RawDir} a descriptor which may be used to access this directory
       */
       */
     openDirectory: function(leafName, accessMode) {
     openDirectory: function(leafName, accessMode) {
  //Unix:    lazy
  //Unix:    lazy -- may call [openat]
  //Windows: lazy
  //Windows: lazy
     },
     },
   
   
     /**
     /**
       * Create a temporary directory.
       * Create a temporary subdirectory.
       *
       *
       * Note: For the time being, there is no guarantee that the temporary directory will be cleaned
       * Note: For the time being, there is no guarantee that the temporary directory will be cleaned
       *
       *
       * @returns {DirectoryDescriptor} a descriptor which may be used to access this directory
       * @returns {RawDir} a descriptor which may be used to access this directory
       */     
       */     
     createTempDirectory: function()
     createTempDirectory: function()
Line 461: Line 433:
     },
     },


== Accessing contents ==
== Information ==
    /**
      * Gather information about the directory
      *
      * @return {Files.FileInfo} information about the file.
      */
    stat: function() {
//Unix:    [lstat]
//Windows: [GetFileInformationByHandle] http://msdn.microsoft.com/en-us/library/windows/desktop/aa364952(v=VS.85).aspx
    },
 
== Browsing contents ==
   
   
     /**
     /**
       * Apply a treatment to all files in the directory.
       * Get the contents of the directory.
      *
      * Note: objects of type DirectoryDescriptor are iterable. Therefore, you can also loop through them using a standard [for..in].
      *
      * @param {(string|regexp|(function(string): boolean)) =} filter. If a [string], uses platform-specific filtering. If a regexp or a function, uses high-level filtering.
      * @param {function(string, FileDescriptor.FileInfo, number)} onFile  A function called for each file in the directory, with the name of the file, a (lazy) file info for that file and a file number. If the function returns anything [null], the loop stops immediately and returns the value returned by that function.
       *
       *
       * @returns The first value returned by [onFile], or [undefined] otherwise.
       * @param {string=} filter. If provided, uses OS-accelerated, platform-specific, filtering, where available.
      * @return {Array.<Files.DirEntry>} The list of files of the directory that match the filter.
       */
       */
     forEachFile: function(filter, onFile) {
     contents: function(filter) {    
  //Unix:    maps to [opendir], [dfd], [readdir]/[readdir64], lazy calls to [stat]
  //Unix:    maps to [opendir], [dfd], [readdir]/[readdir64], lazy calls to [stat], lazy calls to [openat]/[open], [closedir]
  //Windows: maps to [FindFirstFile], [FindNextFile], [Close]
  //Windows: maps to [FindFirstFile], [FindNextFile], [Close]
    }
= Not implemented =
* chmod, chown -- very different between platforms - might implement platform-specific functions
* select, poll, ... -- very different between platforms, higher level
* mmap -- probably feasible, just might require additional API
* locking -- very different between platforms, most likely deserves its own API
* linking -- very different between platforms
* readString, writeString -- ArrayBuffer <-> String conversion most likely deserves its own API
* opening a file or directory from a full path -- error-prone, difficult to optimize, favors hardcoding non-portable paths -- also, we intend to use this API mostly to access files in well-known directories;
* accessing the temporary directory -- it doesn't exist on Android, and emulating would require heavy scaffolding.
= Implementation notes =
* For the moment, the JS team does not recommend using js-ctypes for performance-critical code. Rather, they recommend using JS API, so this is probably the right way to go.
* This is JS code, so by definition not thread-safe.
* Depending on demands by API users, a C++ version may be produced. In this case, we will probably want to make it MT-safe.

Latest revision as of 14:33, 12 January 2013

This page details a proposal for Bug 563742 - Efficient ctypes API for file handling.

Note This is an early draft. The final API is documented on MDN.

The general idea of this API is to provide a low-level, cross-platform, fast access to file management functions. For this reason, it does not implement some primitives that are very different between platforms, e.g. chmod, mmap, epoll.

Conventions This document uses the conventions of the Google Closure Compiler for type annotations.

Problems addressed by this API

The current API for file management is based on nsIFile and its various nsLocalFile* implementations. Several performance-related concerns have surfaced wrt the nsIFile API:

  • the internal implementation is based on string paths, rather than file descriptors, which in turn causes repeated filename lookups during the execution of any non-trivial sequence of operations;
  • it causes numerous expensive calls to stat, both when traversing a directory and when a user is interested in several properties of a file; 
  • input/output streams do not play too nicely with JavaScript.

Note Do we know how expensive the XPCom/XPConnect overhead is, by opposition to js-ctypes and JSAPI?

The present proposal attempts to address these points as follows:

  • it attempts to minimize filename lookups by using file/directory descriptors wherever possible;
  • it attempts to minimize calls to stat, in particular when traversing a directory;
  • input/output attempts to play much more nicely with JavaScript, by using JavaScript low-level data structures designed for this task (and which didn't exist when the nsIFile API was designed).

Problems not addressed by this API

This API is synchronous. This API does not attempt to solve any asynchronicity issue. Consequently, this API is not meant to be used in the main thread. See 691309 for one discussion (among many) on how to remove IO from the main thread.

Note that it is quite possible to build an asynchronous file API based on this API. This is quite simple, and the process of building an async API on top of a sync API may be the best way to obtain within a finite time a cross-platform asynchronous API that works.


Module Files

This module is the first access point to the file API. It contains constructors, functions to copy, move files, etc. as well as the constants used in the API.



Access a file or a directory

    /**
     * Create a temporary file in the system-defined temporary directory (if it exists). This file is deleted when the process closes or when the file is closed, whichever happens first.
     *
     * @param {string=} name An optional name that can be used for debugging purposes as a template for the name of the file created.
     */
    createTempFile: function(name) {
	//Unix:        uses cached [nsIDirectoryService] to get temporary directoy, [mkstemp] and [this.openFile]
	//Windows:     uses cached [nsIDirectoryService] to get temporary directory, [GetTempFileName] + [CreateFile] http://msdn.microsoft.com/en-us/library/windows/desktop/aa363875%28v=vs.85%29.aspx
    },


General utilities

    /**
     * Copy a file.
     *
     * Note: OS-accelerated on some platforms.
     * Note: Attempts to preserve permissions, attributes, etc. However,
     * some file systems do not support this.
     *
     * @param {string} source The name of the file/directory to copy.
     * @param {string} target The name of the file/directory to be created.
     * @param {boolean} overwrite If [false] and if the target already exists, fail.
     * @throws RawFileError
     */
    copy: function(source, target, overwrite)
    {
	//Unix:     need to implement with [open], [read], [write], [close]
       //          TODO: check for existing library routines in Glib / Qt
	//Windows:  maps to [CopyFile] http://msdn.microsoft.com/en-us/library/aa363851(v=VS.85).aspx
            //Check if it works with directories
    },

    /**
     * Move a file or a directory.
     *
     * Note: OS-accelerated whenever possible.
     * Note: Attempts to preserve permissions, attributes, etc. However,
     * some file systems do not support this.
     *
     * @param {string} source The name of the file/directory to move.
     * @param {string} target The name of the file/directory to be created.
     * @param {boolean} overwrite If [false] and if the target already exists, fail.
     * @throws RawFileError
     */
    move: function(source, target, overwrite)
    {
	//Unix:     maps to [rename] or, when [rename] returns EXDEV, on [Files.copy]+[Files.remove]
	//Windows:  maps to [MoveFile] http://msdn.microsoft.com/en-us/library/aa365239(v=VS.85).aspx
            //Check if it works with directories
    },
 
    /**
     * Remove a file.
     *
     * Note: OS-accelerated under some platforms.
     */
    remove: function(name)
    {
	//Unix:    maps to [unlink] 
	//Windows: maps to [DeleteFile]
    },

Constants

    /**
     * Return the location of the directory/folder used to store user profile
     *
     * Computed lazily, cached.
     *
     * @return {RawDir}
     */
    get profileDir() {
	//All platforms: use [nsIDirectoryService] to get the directory the first time, cache it.
    },
    
  
    //TODO: Add other well-known directories.

Flags

Flags for file opening

Note that these flags are separated for performance+portability reasons. Each category of flag is meant to be or-ed.

    Open: {
	/**
        * Open for reading, writing or both.
        *
	 * @enum {number}
	 */
	Access: {
           /** Open file for reading */
	    READ:   ...,
           /** Open file for writing */
	    WRITE:  ...,
	},

	/**
	 * @enum {number}
	 */
	Content: {
           /** Create file if it doesn't exist*/
	    MAY_CREATE:...,
           /** Create file; fail if the file already exists */
           MUST_CREATE:...,
           /** Write at the start of file if it exists. If not specified, append.*/
	    OVERWRITE: ...,
	},

	/**
	 * @enum {number}
	 */
	Pragma: {

           /** Windows-specific pragma: optimize cache for sequential access*/
	    SEQUENTIAL_ACCESS:   ....

           /** Windows-specific pragma: optimize cache for random access*/
	    RANDOM_ACCESS:       ...,

           /** Windows-specific pragma: do not buffer writes*/
	    WRITE_THROUGH:       ...
	}
    },

Flags for seeking in a file

    Seek: {
	/**
	 * Possible methods for seeking.
	 *
	 * @enum {number}
	 */
	Method: {
	    /**
	     * Seek from file start
	     */
	    SET: ...,

	    /**
	     * Seek from current position
	     */
	    CUR: ...,

	    /**
	     * Seek from file end
	     */
	    END: ...
	},
    },

Interfaces

File Information

    /**
     * The kind of information that can be found by calling [RawFile.info] or [RawDir.contents].
     *
     * Note that some or all fields may be computed lazily.
     * This makes it possible to fill in fields whose value
     * has already been computed (typically, by directory
     * traversal), without performing a system
     * call to get the value of the other fields.
     *
     * @interface
     */
    FileInfo: { 
	/**
	 * @return {number} milliseconds
	 */
	get lastModificationTime() : {
	    ...
	},
	/**
	 * @return {number} bytes
	 */
	get size() : {
	    ...
	},

       /**
        * Note: this property is OS-accelerated for entries returned by [forEachFile] or by enumerating files in a directory.
        *
        * @return {boolean}
        */
       get isDirectory(): {
            
       }
    },

Directory entries

    DirEntry: {
       /**
        * The name of the file.
        *
        * Note that there is no guarantee that the file still exists by the time you attempt to open it.
        *
        * @return {string=}
        */
       get name(): { ...
       },

       /**
        * All the information that could be gathered about the file without opening it.
        *
        * @return {FileInfo}
        */
       get info(): { ...
           //Precomputed by [RawDir.contents]
       }
    },

Errors

    /**
     * An exception launched by this module.
     *
     * TODO: Specify.
     *
     * @constructor
     * @extends {Error}
     */
    Error: function(){}

Instances of RawFile

A RawFile is a low-level object wrapping a native file descriptor (under variants of Unix) or a file handle (under Windows).

Reading

    /**
     * Read some content from a file from the current position, advance.
     *
     * @param {ArrayBuffer} buf The buffer which will receive the data.
     * @param {number} offset The position in the array at which to start putting data, in bytes.
     * @param {number} size The maximal number of bytes to read. This method can read less bytes if (and only if) the file is shorter.
     * @return {number} The number of bytes read.
     * @throws {RawFileException} In case of file error.
     * @throws {INDEX_SIZE_ERR} In case of array error.
     */
    read: function(buf, offset, size) {
	//Unix:    [read]
	//Windows: [ReadFile]  http://msdn.microsoft.com/en-us/library/windows/desktop/aa365467%28v=VS.85%29.aspx
    },

    /**
     * As [read], but read from a given position and do not advance.
     *
     * @param {number} fileOffset The position in the file from which to read.
     * @param {ArrayBuffer} buf The buffer which will receive the data.
     * @param {number} offset The position in the array at which to start putting data, in bytes.
     * @param {number} size The maximal number of bytes to read. This method can read less bytes if the file is shorter.
     * @return {number} The number of bytes read.
     * @throws {RawFileException} In case of file error.
     * @throws {INDEX_SIZE_ERR} In case of array error.
     */
    pread: function(fileOffset, buf, offset, size) {
	//Unix:     [pread]
	//Windows:  [ReadFile] + [SetFilePointer] http://msdn.microsoft.com/en-us/library/windows/desktop/aa365541(v=VS.85).aspx
    },

Writing

    /**
     * Write some content to a file, advance.
     *
     * @param {ArrayBuffer} buf The buffer containing the data.
     * @param {number=} offset The position in the array at which the data starts, in bytes. If unspecified, 0.
     * @param {number=} size The maximal number of bytes to read. This method can write less bytes, depending on buffering. If unspecified, everything from [offset].
     *
     * @return {number} The number of bytes written.
     * @throws {RawFileException} In case of file error.
     * @throws {INDEX_SIZE_ERR} In case of array error.
     */
    write: function(buf, offset, size) {
	//Unix:    [write]
	//Windows: [WriteFile] http://msdn.microsoft.com/en-us/library/windows/desktop/aa365747%28v=VS.85%29.aspx
    },
 
    /**
     * As [write], but write to a specific position and do not advance
     */
    pwrite: function(fileOffset, buf, offset, size) {
	//Unix: [pwrite]
	//Windows: [WriteFile] + [SetFilePointer]
    },

Attributes

    /**
     * Gather information about the file
     *
     * @return {Files.FileInfo} information about the file.
     */
    stat: function() {
	//Unix:    [lstat]
	//Windows: [GetFileInformationByHandle] http://msdn.microsoft.com/en-us/library/windows/desktop/aa364952(v=VS.85).aspx
    },

    /**
     * Set the size of the file
     *
     * @param {number} newSize The size to give to the file.
     */
    setSize: function(newSize) {
	//Unix:    [truncate]
	//Windows: [SetFileValidData] http://msdn.microsoft.com/en-us/library/windows/desktop/aa365544%28v=VS.85%29.aspx
    },

Misc

    /**
     * Change the position in the current file
     *
     * @param {number} delta Number of bytes. Can be positive or negative.
     * @param {RawFile.Seek.Methodmethod} Determine whether [delta] is to be taken from the start of the file, from the end or from the current position.
     */
    seek: function(delta, method) {
	//Unix:    [lseek]
	//Windows: [SetFilePointer]
    },

    /**
     * Close a file descriptor.
     *
     * Any further operation on that file descriptor will launch an exception
     */
    close: function() {
	//Unix:    [close]
	//Windows: [CloseHandle]
    },

    /**
     * Flush the buffer
     */
    flush: function() {
	//Unix:    [fsync]
	//Windows: [FlushFileBuffers] http://msdn.microsoft.com/en-us/library/windows/desktop/aa364439(v=VS.85).aspx
    },

Instances of RawDir

A RawDir is an object wrapping either a directory name or a directory descriptor, depending on the platform. On the Unix side, some of the methods rely upon (or have to reimplement) systems that obey recent versions of Posix, with functions such as openat.

Opening/creating

    /**
     * Open a file from a directory
     *
     * @param {string} leafName The name of the file.
     * @param {number=} accessMode A or-ing of flags, as specified by [RawFile.Open.Access]. If this argument is not provided, we assume 0, i.e. no flags.
     * @param {number=} contentMode A or-ing of flags, as specified by [RawFile.Content.Access]. If this argument is not provided, we assume 0, i.e. no flags.
     * @param {number=} pragmaMode A or-ing of flags, as specified by [RawFile.Pragma.Access]. If this argument is not provided, we assume 0, i.e. no flags.
     * @return {RawFile} a RawFile
     *
     * @throws RawFileError
     */
    openFile: function(leafName, accessMode, contentMode, pragmaMode) {
	//Linux:     [openat]
       //Unix:      decide between gnulib [openat] and simply [open]
	//Windows:  cf. [RawFile.open]
    },

    /**
     * Create a temporary file in this directory. This file is deleted when the process closes or when the file is closed, whichever happens first.
     */
    createTempFile: function() {
	//Unix:        uses [mkstemp] and [this.openFile]
	//Windows:     maps to [GetTempFileName] + [CreateFile] http://msdn.microsoft.com/en-us/library/windows/desktop/aa363875%28v=vs.85%29.aspx
    },

    /**
     * Open a subdirectory of this directory.
     *
     * @param {string} leafName The platform-specific name of the directory.
     * @param {number=} accessMode A or-ing of flags, as specified by [RawFile.OpenDir.Access]
     *
     * @returns {RawDir} a descriptor which may be used to access this directory
     */
    openDirectory: function(leafName, accessMode) {
	//Unix:    lazy -- may call [openat]
	//Windows: lazy
    },

    /**
     * Create a temporary subdirectory.
     *
     * Note: For the time being, there is no guarantee that the temporary directory will be cleaned
     *
     * @returns {RawDir} a descriptor which may be used to access this directory
     */    
    createTempDirectory: function()
    {
    },

Information

    /**
     * Gather information about the directory
     *
     * @return {Files.FileInfo} information about the file.
     */
    stat: function() {
	//Unix:    [lstat]
	//Windows: [GetFileInformationByHandle] http://msdn.microsoft.com/en-us/library/windows/desktop/aa364952(v=VS.85).aspx
    },

Browsing contents

    /**
     * Get the contents of the directory.
     *
     * @param {string=} filter. If provided, uses OS-accelerated, platform-specific, filtering, where available.
     * @return {Array.<Files.DirEntry>} The list of files of the directory that match the filter.
     */
    contents: function(filter) {     
	//Unix:    maps to [opendir], [dfd], [readdir]/[readdir64], lazy calls to [stat], lazy calls to [openat]/[open], [closedir]
	//Windows: maps to [FindFirstFile], [FindNextFile], [Close]
    }

Not implemented

  • chmod, chown -- very different between platforms - might implement platform-specific functions
  • select, poll, ... -- very different between platforms, higher level
  • mmap -- probably feasible, just might require additional API
  • locking -- very different between platforms, most likely deserves its own API
  • linking -- very different between platforms
  • readString, writeString -- ArrayBuffer <-> String conversion most likely deserves its own API
  • opening a file or directory from a full path -- error-prone, difficult to optimize, favors hardcoding non-portable paths -- also, we intend to use this API mostly to access files in well-known directories;
  • accessing the temporary directory -- it doesn't exist on Android, and emulating would require heavy scaffolding.

Implementation notes

  • For the moment, the JS team does not recommend using js-ctypes for performance-critical code. Rather, they recommend using JS API, so this is probably the right way to go.
  • This is JS code, so by definition not thread-safe.
  • Depending on demands by API users, a C++ version may be produced. In this case, we will probably want to make it MT-safe.