i am writing a device driver on linux-2.6.26. I want to have a dma buffer mapped into userspace for sending data from driver to userspace application. Please suggest some good tutorial on it.



Here is what I have used, in brief...

get_user_pages to pin the user page(s) and give you an array of struct page * pointers.

dma_map_page on each struct page * to get the DMA address (aka. "I/O address") for the page. This also creates an IOMMU mapping (if needed on your platform).

Now tell your device to perform the DMA into the memory using those DMA addresses. Obviously they can be non-contiguous; memory is only guaranteed to be contiguous in multiples of the page size.

dma_sync_single_for_cpu to do any necessary cache flushes or bounce buffer blitting or whatever. This call guarantees that the CPU can actually see the result of the DMA, since on many systems, modifying physical RAM behind the CPU's back results in stale caches.

dma_unmap_page to free the IOMMU mapping (if it was needed on your platform).

put_page to un-pin the user page(s).

Note that you must check for errors all the way through here, because there are limited resources all over the place. get_user_pages returns a negative number for an outright error (-errno), but it can return a positive number to tell you how many pages it actually managed to pin (physical memory is not limitless). If this is less than you requested, you still must loop through all of the pages it did pin in order to call put_page on them. (Otherwise you are leaking kernel memory; very bad.)

dma_map_page can also return an error (-errno), because IOMMU mappings are another limited resource.

dma_unmap_page and put_page return void, as usual for Linux "freeing" functions. (Linux kernel resource management routines only return errors because something actually went wrong, not because you screwed up and passed a bad pointer or something. The basic assumption is that you are never screwing up because this is kernel code. Although get_user_pages does check to ensure the validity of the user addresses and will return an error if the user handed you a bad pointer.)

You can also consider using the _sg functions if you want a friendly interface to scatter/gather. Then you would call dma_map_sg instead of dma_map_page, dma_sync_sg_for_cpu instead of dma_sync_single_for_cpu, etc.

Also note that many of these functions may be more-or-less no-ops on your platform, so you can often get away with being sloppy. (In particular, dma_sync_... and dma_unmap_... do nothing on my x86_64 system.) But on those platforms, the calls themselves get compiled into nothing, so there is no excuse for being sloppy.


OK, this is what I did. Disclaimer: I'm a hacker in the pure sense of the word and my code ain't the prettiest. I read LDD3 and infiniband source code and other predecessor stuff and decided that "get_user_pages" and pinning them and all that other rigmarole was just too painful to contemplate while hungover. Also, I was working with the other person across the PCIe bus and I was also responsible in "designing" the user space application. I wrote the driver such that at load time, it preallocates as many buffers as it can with the largest size by calling the function myAddr[i] = pci_alloc_consistent(blah,size,&pci_addr[i]) until it fails. (failure -> myAddr[i] is NULL I think, I forget). I was able to allocate around 2.5GB of buffers, each 4MiB in size in my meagre machine which only has 4GiB of memory. The total number of buffers varies depending on when the kernel module is loaded of course. Load the driver at boot time and the most buffers are allocated. Each individual buffer's size maxed out at 4MiB in my system. Not sure why. I catted /proc/buddyinfo to make sure I wasn't doing anything stupid which is of course my usual starting pattern. The driver then proceeds to give the array of pci_addr to the PCIe device along with their sizes. The driver then just sits there waiting for the interrupt storm to begin. Meanwhile in userspace, the application opens the driver, queries the number of allocated buffers(n) and their sizes (using ioctls or reads etc) and then proceeds to call the system call mmap() multiple (n) times. Of course mmap() must be properly implemented in the driver and LDD3 pages 422-423 were handy. Userspace now has n pointers to n areas of driver memory. As the driver is interrupted by the PCIe device, it's told which buffers are "full" or "available" to be sucked dry. The application in turn is pending on a read() or ioctl() to be told which buffers are full of useful data. The tricky part was to manage the userspace to kernel space synchronization such that buffers which are being DMA's into by the PCIe are not also being modified by userspace but that's what we get paid for. I hope this makes sense and I'd be more than happy to be told I'm an idiot but please tell me why. I recommend this book as well by the way: http://www.amazon.com/Linux-Programming-Interface-System-Handbook/dp/1593272200 . I wish I had that book seven years ago when I wrote my first Linux driver. There is another type of trickery possible by adding even more memory and not letting the kernel use it and mmapping on both sides of the userspace/kernelspace divide but the PCI device must also support higher than 32-bit DMA addressing. I haven't tried but I wouldn't be surprised if I'll eventually be forced to.


Well, if you have LDD, you can have a look at chapter 15, and more precisely page 435, where Direct I/O operations are described.

The kernel call that will help you achieve this is get_user_pages. In your case since you want to send data from kernel to userspace, you should set the write flag to 1.

Be also aware that the asynchronous I/O may allow you to achieve the same results but with your userspace application not having to wait for the read to finish which can be better.


Take a good look at the Infiniband drivers. They go to much effort to make zero-copy DMA and RDMA to user-space work.

I forgot to add this before saving:

Doing DMA directly to user-space memory mappings is full of problems, so unless you have very high performance requirements like Infiniband or 10 Gb Ethernet, don't do it. Instead, copy the DMA'd data into the userspace buffers. It will save you much grief.

For just one example, what if the user's program exits before the DMA is complete? What if the user memory is reallocated to another process after exit but the hardware is still set to DMA into that page? Disaster!


remap_pfn_range function (used in mmap call in driver) can be used to map kernel memory to user space.

A real example could be found in mem character driver drivers/char/mem.c.